In
software engineering
Software engineering is a branch of both computer science and engineering focused on designing, developing, testing, and maintaining Application software, software applications. It involves applying engineering design process, engineering principl ...
, profiling (program profiling, software profiling) is a form of
dynamic program analysis
Dynamics (from Greek δυναμικός ''dynamikos'' "powerful", from δύναμις ''dynamis'' " power") or dynamic may refer to:
Physics and engineering
* Dynamics (mechanics), the study of forces and their effect on motion
Brands and en ...
that measures, for example, the space (memory) or time
complexity of a program, the
usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid
program optimization, and more specifically,
performance engineering.
Profiling is achieved by
instrumenting either the program
source code
In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer.
Since a computer, at base, only ...
or its binary executable form using a tool called a ''profiler'' (or ''code profiler''). Profilers may use a number of different techniques, such as event-based, statistical, instrumented, and simulation methods.
Gathering program events
Profilers use a wide variety of techniques to collect data, including
hardware interrupt
In digital computers, an interrupt (sometimes referred to as a trap) is a request for the processor to ''interrupt'' currently executing code (when permitted), so that the event can be processed in a timely manner. If the request is accepted ...
s,
code instrumentation,
instruction set simulation, operating system
hooks, and
performance counters.
Use of profilers

The output of a profiler may be:
* A statistical ''summary'' of the events observed (a profile)
:Summary profile information is often shown annotated against the source code statements where the events occur, so the size of measurement data is linear to the code size of the program.
/* ------------ source------------------------- count */
0001 IF X = "A" 0055
0002 THEN DO
0003 ADD 1 to XCOUNT 0032
0004 ELSE
0005 IF X = "B" 0055
* A stream of recorded events (a trace)
:For sequential programs, a summary profile is usually sufficient, but performance problems in parallel programs (waiting for messages or synchronization issues) often depend on the time relationship of events, thus requiring a full trace to get an understanding of what is happening.
: The size of a (full) trace is linear to the program's
instruction path length
In computer performance, the instruction path length is the number of machine code instructions required to execute a section of a computer program. The total path length for the entire program could be deemed a measure of the algorithm's perfor ...
, making it somewhat impractical. A trace may therefore be initiated at one point in a program and terminated at another point to limit the output.
* An ongoing interaction with the
hypervisor (continuous or periodic monitoring via on-screen display for instance)
: This provides the opportunity to switch a trace on or off at any desired point during execution in addition to viewing on-going metrics about the (still executing) program. It also provides the opportunity to suspend asynchronous processes at critical points to examine interactions with other parallel processes in more detail.
A profiler can be applied to an individual method or at the scale of a module or program, to identify performance bottlenecks by making long-running code obvious. A profiler can be used to understand code from a timing point of view, with the objective of optimizing it to handle various runtime conditions or various loads. Profiling results can be ingested by a compiler that provides
profile-guided optimization. Profiling results can be used to guide the design and optimization of an individual algorithm; the
Krauss matching wildcards algorithm is an example. Profilers are built into some
application performance management systems that aggregate profiling data to provide insight into
transaction workloads in
distributed applications.
History
Performance-analysis tools existed on
IBM/360 and
IBM/370 platforms from the early 1970s, usually based on timer interrupts which recorded the
program status word (PSW) at set timer-intervals to detect "hot spots" in executing code. This was an early example of
sampling (see below). In early 1974
instruction-set simulators permitted full trace and other performance-monitoring features.
Profiler-driven program analysis on Unix dates back to 1973,
[Unix Programmer's Manual, 4th Edition](_blank)
/ref> when Unix systems included a basic tool, prof
, which listed each function and how much of program execution time it used. In 1982 gprof
extended the concept to a complete call graph analysis.[
S.L. Graham, P.B. Kessler, and M.K. McKusick]
''gprof: a Call Graph Execution Profiler''
Proceedings of the SIGPLAN '82 Symposium on Compiler Construction, ''SIGPLAN
SIGPLAN is the Association for Computing Machinery's Special Interest Group (SIG) on programming languages. This SIG explores programming language concepts and tools, focusing on design, implementation, practice, and theory. Its members are progra ...
Notices'', Vol. 17, No 6, pp. 120-126; doi:10.1145/800230.806987
In 1994, Amitabh Srivastava and Alan Eustace of Digital Equipment Corporation
Digital Equipment Corporation (DEC ), using the trademark Digital, was a major American company in the computer industry from the 1960s to the 1990s. The company was co-founded by Ken Olsen and Harlan Anderson in 1957. Olsen was president until ...
published a paper describing ATOM (Analysis Tools with OM). The ATOM platform converts a program into its own profiler: at compile time
In computer science, compile time (or compile-time) describes the time window during which a language's statements are converted into binary instructions for the processor to execute. The term is used as an adjective to describe concepts relat ...
, it inserts code into the program to be analyzed. That inserted code outputs analysis data. This technique - modifying a program to analyze itself - is known as "instrumentation
Instrumentation is a collective term for measuring instruments, used for indicating, measuring, and recording physical quantities. It is also a field of study about the art and science about making measurement instruments, involving the related ...
".
In 2004 both the gprof
and ATOM papers appeared on the list of the 50 most influential PLDI
The Programming Language Design and Implementation (PLDI) conference is an annual computer science conference organized by the Association for Computing Machinery (ACM) which focuses on the study of algorithms, programming languages and compiler ...
papers for the 20-year period ending in 1999.
Profiler types based on output
Flat profiler
Flat profilers compute the average call times, from the calls, and do not break down the call times based on the callee or the context.
Call-graph profiler
Call graph profilers show the call times, and frequencies of the functions, and also the call-chains involved based on the callee. In some tools full context is not preserved.
Input-sensitive profiler
Input-sensitive profilers[E. Coppa, C. Demetrescu, and I. Finocchi]
''Input-Sensitive Profiling''
IEEE Trans. Software Eng. 40(12): 1185-1205 (2014); doi:10.1109/TSE.2014.2339825 add a further dimension to flat or call-graph profilers by relating performance measures to features of the input workloads, such as input size or input values. They generate charts that characterize how an application's performance scales as a function of its input.
Data granularity in profiler types
Profilers, which are also programs themselves, analyze target programs by collecting information on the target program's execution. Based on their data granularity, which depends upon how profilers collect information, they are classified as ''event-based'' or ''statistical'' profilers. Profilers interrupt program execution to collect information. Those interrupts can limit time measurement resolution, which implies that timing results should be taken with a grain of salt. Basic block
In compiler construction, a basic block is a straight-line code sequence with no branches in except to the entry and no branches out except at the exit. This restricted form makes a basic block highly amenable to analysis. Compilers usually decom ...
profilers report a number of machine clock cycles devoted to executing each line of code, or timing based on adding those together; the timings reported per basic block may not reflect a difference between cache hits and misses.
Event-based profilers
Event-based profilers are available for the following programming languages:
* Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
: the JVMTI (JVM Tools Interface) API, formerly JVMPI (JVM Profiling Interface), provides hooks to profilers, for trapping events like calls, class-load, unload, thread enter leave.
* .NET
The .NET platform (pronounced as "''dot net"'') is a free and open-source, managed code, managed computer software framework for Microsoft Windows, Windows, Linux, and macOS operating systems. The project is mainly developed by Microsoft emplo ...
: Can attach a profiling agent as a ''COM'' server to the ''CLR'' using Profiling ''API''. Like Java, the runtime then provides various callbacks into the agent, for trapping events like method JIT / enter / leave, object creation, etc. Particularly powerful in that the profiling agent can rewrite the target application's bytecode in arbitrary ways.
* Python: Python profiling includes the profile module, hotshot (which is call-graph based), and using the 'sys.setprofile' function to trap events like c_, python_.
* Ruby
Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...
: Ruby also uses a similar interface to Python for profiling. Flat-profiler in profile.rb, module, and ruby-prof a C-extension are present.
Statistical profilers
These profilers operate by sampling. A sampling profiler probes the target program's call stack
In computer science, a call stack is a Stack (abstract data type), stack data structure that stores information about the active subroutines and block (programming), inline blocks of a computer program. This type of stack is also known as an exe ...
at regular intervals using operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
interrupt
In digital computers, an interrupt (sometimes referred to as a trap) is a request for the processor to ''interrupt'' currently executing code (when permitted), so that the event can be processed in a timely manner. If the request is accepted ...
s. Sampling profiles are typically less numerically accurate and specific, providing only a statistical approximation, but allow the target program to run at near full speed. "The actual amount of error is usually more than one sampling period. In fact, if a value is n times the sampling period, the expected error in it is the square-root of n sampling periods."
In practice, sampling profilers can often provide a more accurate picture of the target program's execution than other approaches, as they are not as intrusive to the target program and thus don't have as many side effects (such as on memory caches or instruction decoding pipelines). Also since they don't affect the execution speed as much, they can detect issues that would otherwise be hidden. They are also relatively immune to over-evaluating the cost of small, frequently called routines or 'tight' loops. They can show the relative amount of time spent in user mode versus interruptible kernel mode such as system call processing.
Unfortunately, running kernel code to handle the interrupts incurs a minor loss of CPU cycles from the target program, diverts cache usage, and cannot distinguish the various tasks occurring in uninterruptible kernel code (microsecond-range activity) from user code. Dedicated hardware can do better: ARM Cortex-M3 and some recent MIPS processors' JTAG interfaces have a PCSAMPLE register, which samples the program counter
The program counter (PC), commonly called the instruction pointer (IP) in Intel x86 and Itanium microprocessors, and sometimes called the instruction address register (IAR), the instruction counter, or just part of the instruction sequencer, ...
in a truly undetectable manner, allowing non-intrusive collection of a flat profile.
Some commonly used statistical profilers for Java/managed code are SmartBear Software's AQtime and Microsoft
Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
's CLR Profiler. Those profilers also support native code profiling, along with Apple Inc.'s Shark
Sharks are a group of elasmobranch cartilaginous fish characterized by a ribless endoskeleton, dermal denticles, five to seven gill slits on each side, and pectoral fins that are not fused to the head. Modern sharks are classified within the ...
(OSX), OProfile (Linux), Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
VTune and Parallel Amplifier (part of Intel Parallel Studio), and Oracle
An oracle is a person or thing considered to provide insight, wise counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. If done through occultic means, it is a form of divination.
Descript ...
Performance Analyzer, among others.
Instrumentation
This technique effectively adds instructions to the target program to collect the required information. Note that instrumenting a program can cause performance changes, and may in some cases lead to inaccurate results and/or heisenbugs. The effect will depend on what information is being collected, on the level of timing details reported, and on whether basic block profiling is used in conjunction with instrumentation. For example, adding code to count every procedure/routine call will probably have less effect than counting how many times each statement is obeyed. A few computers have special hardware to collect information; in this case the impact on the program is minimal.
Instrumentation is key to determining the level of control and amount of time resolution available to the profilers.
* Manual: Performed by the programmer, e.g. by adding instructions to explicitly calculate runtimes, simply count events or calls to measurement API
An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
s such as the Application Response Measurement standard.
* Automatic source level: instrumentation added to the source code by an automatic tool according to an instrumentation policy.
* Intermediate language: instrumentation added to assembly or decompiled bytecode
Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (normal ...
s giving support for multiple higher-level source languages and avoiding (non-symbolic) binary offset re-writing issues.
* Compiler assisted
* Binary translation: The tool adds instrumentation to a compiled executable
In computer science, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instruction (computer science), in ...
.
* Runtime instrumentation: Directly before execution the code is instrumented. The program run is fully supervised and controlled by the tool.
* Runtime injection: More lightweight than runtime instrumentation. Code is modified at runtime to have jumps to helper functions.
Interpreter instrumentation
* Interpreter debug options can enable the collection of performance metrics as the interpreter encounters each target statement. A bytecode
Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (normal ...
, control table or JIT interpreters are three examples that usually have complete control over execution of the target code, thus enabling extremely comprehensive data collection opportunities.
Hypervisor/simulator
* Hypervisor: Data are collected by running the (usually) unmodified program under a hypervisor. Example: SIMMON
* Simulator and Hypervisor: Data collected interactively and selectively by running the unmodified program under an instruction set simulator.
See also
*
*
*
*
*
*
*
*
*
*
*
*
* (WCET)
References
External links
* Article
Need for speed — Eliminating performance bottlenecks
on doing execution time analysis of Java applications using IBM Rational Application Developer.
Profiling Runtime Generated and Interpreted Code using the VTune Performance Analyzer
{{DEFAULTSORT:Software Performance Analysis
Software optimization
*