computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

, pointer analysis, or points-to analysis, is a

static code analysis In computer science, static program analysis (also known as static analysis or static simulation) is the analysis of computer programs performed without executing them, in contrast with dynamic program analysis, which is performed on programs duri ...

technique that establishes which pointers, or heap references, can point to which

variables Variable may refer to: Computer science * Variable (computer science), a symbolic name associated with a value and whose associated value may be changed Mathematics * Variable (mathematics), a symbol that represents a quantity in a mathemat ...

, or storage locations. It is often a component of more complex analyses such as

escape analysis In compiler optimization, escape analysis is a method for determining the dynamic scope of pointers where in the program a pointer can be accessed. It is related to pointer analysis and shape analysis. When a variable (or an object) is allocat ...

. A closely related technique is shape analysis. This is the most common colloquial use of the term. A secondary use has ''pointer analysis'' be the collective name for both points-to analysis, defined as above, and

alias analysis Alias analysis is a technique in compiler theory, used to determine if a storage location may be accessed in more than one way. Two pointers are said to be aliased if they point to the same location. Alias analysis techniques are usually classifi ...

. Points-to and alias analysis are closely related but not always equivalent problems.

Example

Consider the following C program: int *id(int* p) void main(void) A pointer analysis computes a mapping from pointer expressions to a set of allocation sites of objects they may point to. For the above program, an idealized, fully precise analysis would compute the following results: (Where X::Y represents the stack allocation holding the local variable Y in the function X.) However, a context-insensitive analysis such as Andersen's or Steensgaard's algorithm would lose precision when analyzing the calls to id, and compute the following result:

Introduction

As a form of static analysis, fully precise pointer analysis can be shown to be undecidable. Most approaches are

sound In physics, sound is a vibration that propagates as an acoustic wave through a transmission medium such as a gas, liquid or solid. In human physiology and psychology, sound is the ''reception'' of such waves and their ''perception'' by the br ...

, but range widely in performance and precision. Many design decisions impact both the precision and performance of an analysis; often (but not always) lower precision yields higher performance. These choices include: * ''Field sensitivity'' (also known as ''structure sensitivity''): An analysis can either treat each field of a

struct In computer science, a record (also called a structure, struct, or compound data type) is a composite data structure a collection of fields, possibly of different data types, typically fixed in number and sequence. For example, a date could b ...

object Object may refer to: General meanings * Object (philosophy), a thing, being, or concept ** Object (abstract), an object which does not exist at any particular time or place ** Physical object, an identifiable collection of matter * Goal, an a ...

separately, or merge them. * ''Array sensitivity'': An array-sensitive pointer analysis models each index in an array separately. Other choices include modelling just the first entry separately and the rest together, or merging all array entries. * ''Context sensitivity'' or ''

polyvariance In program analysis, a polyvariant or context-sensitive analysis (as opposed to a monovariant or context-insensitive analysis) analyzes each function multiple times—typically once at each call site—to improve the precision of the analy ...

'': Pointer analyses may qualify points-to information with a summary of the control flow leading to each program point. * ''Flow sensitivity'': An analysis can model the impact of intraprocedural control flow on points-to facts. * ''Heap modeling'': Run-time allocations may be abstracted by: ** their allocation sites (the statement or instruction that performs the allocation, e.g., a call to malloc or an object constructor), ** a more complex model based on a shape analysis, ** the type of the allocation, or ** one single allocation (this is called ''heap-insensitivity''). * ''Heap cloning'': Heap- and context-sensitive analyses may further qualify each allocation site by a summary of the control flow leading to the instruction or statement performing the allocation. * ''Subset constraints'' or ''equality constraints'': When propagating points-to facts, different program statements may induce different constraints on a variable's points-to sets. Equality constraints (like those used in Steensgaard's algorithm) can be tracked with a union-find data structure, leading to high performance at the expense of the precision of a subset-constraint based analysis (e.g., Andersen's algorithm).

Context-insensitive, flow-insensitive algorithms

Pointer analysis algorithms are used to convert collected raw pointer usages (assignments of one pointer to another or assigning a pointer to point to another one) to a useful graph of what each pointer can point to. Steensgaard's algorithm and Andersen's algorithm are common context-insensitive, flow-insensitive algorithms for pointer analysis. They are often used in compilers, and have implementations i
SVF
and

LLVM LLVM, also called LLVM Core, is a target-independent optimizer and code generator. It can be used to develop a Compiler#Front end, frontend for any programming language and a Compiler#Back end, backend for any instruction set architecture. LLVM i ...

Flow-insensitive approaches

Many approaches to flow-insensitive pointer analysis can be understood as forms of

abstract interpretation In computer science, abstract interpretation is a theory of sound approximation of the semantics of computer programs, based on monotonic functions over ordered sets, especially lattices. It can be viewed as a partial execution of a computer pro ...

, where heap allocations are abstracted by their allocation site (i.e., a program location). Pointer Analysis - Abstracting Memory Addresses by Their Allocation Site

Pointer Analysis - Abstracting Memory Addresses by Their Allocation Site

Many flow-insensitive algorithms are specified in

Datalog Datalog is a declarative logic programming language. While it is syntactically a subset of Prolog, Datalog generally uses a bottom-up rather than top-down evaluation model. This difference yields significantly different behavior and properties ...

, including those in the Soot analysis framework for Java. Context-sensitive, flow-sensitive algorithms achieve higher precision, generally at the cost of some performance, by analyzing each procedure several times, once per ''context''. Most analyses use a "context-string" approach, where contexts consist of a list of entries (common choices of context entry include call sites, allocation sites, and types). To ensure termination (and more generally, scalability), such analyses generally use a ''k''-limiting approach, where the context has a fixed maximum size, and the least recently added elements are removed as needed. Three common variants of context-sensitive, flow-insensitive analysis are: * Call-site sensitivity * Object sensitivity * Type sensitivity

Call-site sensitivity

In call-site sensitivity, the points-to set of each variable (the set of abstract heap allocations each variable could point to) is further qualified by a context consisting of a list of callsites in the program. These contexts abstract the control-flow of the program. The following program demonstrates how call-site sensitivity can achieve higher precision than a flow-insensitive, context-insensitive analysis. int *id(int* p) void main(void) For this program, a context-insensitive analysis would (soundly but imprecisely) conclude that can point to either the allocation holding or that of , so and may alias, and both could point to either allocation: A callsite-sensitive analysis would analyze twice, once for main.3 and once for main.4, and the points-to facts for would be qualified by the call-site, enabling the analysis to deduce that when returns, can only point to the allocation holding and can only point to the allocation holding :

Object sensitivity

In an object sensitive analysis, the points-to set of each variable is qualified by the abstract heap allocation of the receiver object of the method call. Unlike call-site sensitivity, object-sensitivity is ''non-syntactic'' or ''non-local'': the context entries are derived during the points-to analysis itself.

Type sensitivity

Type sensitivity is a variant of object sensitivity where the allocation site of the receiver object is replaced by the class/type containing the method containing the allocation site of the receiver object. This results in strictly fewer contexts than would be used in an object-sensitive analysis, which generally means better performance.

References

Bibliography

* * * * * * {{Compiler optimizations Static program analysis Pointers (computer programming)