In statistics, a P–P plot (probability–probability plot or percent–percent plot or P value plot) is a probability plot for assessing how closely two
data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the d ...
s agree, or for assessing how closely a dataset fits a particular model. It works by plotting the two
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ev ...
s against each other; if they are similar, the data will appear to be nearly a straight line. This behavior is similar to that of the more widely used
Q–Q plot
In statistics, a Q–Q plot (quantile-quantile plot) is a probability plot, a graphical method for comparing two probability distributions by plotting their '' quantiles'' against each other. A point on the plot corresponds to one of the q ...
, with which it is often confused.
Definition
A P–P plot plots two
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ev ...
s (cdfs) against each other:
given two probability distributions, with cdfs "''F''" and "''G''", it plots
as ''z'' ranges from
to
As a cdf has range
,1 the domain of this parametric graph is
and the range is the unit square
Thus for input ''z'' the output is the pair of numbers giving what ''percentage'' of ''f'' and what ''percentage'' of ''g'' fall at or below ''z.''
The comparison line is the 45° line from (0,0) to (1,1), and the distributions are equal if and only if the plot falls on this line. The degree of deviation makes it easy to visually identify how different the distributions are, but because of sampling error, even samples drawn from identical distributions will not appear identical.
Example
As an example, if the two distributions do not overlap, say ''F'' is below ''G,'' then the P–P plot will move from left to right along the bottom of the square – as ''z'' moves through the support of ''F,'' the cdf of ''F'' goes from 0 to 1, while the cdf of ''G'' stays at 0 – and then moves up the right side of the square – the cdf of ''F'' is now 1, as all points of ''F'' lie below all points of ''G,'' and now the cdf of ''G'' moves from 0 to 1 as ''z'' moves through the support of ''G.'' (need a graph for this paragraph)
Use
As the above example illustrates, if two distributions are separated in space, the P–P plot will give very little data – it is only useful for comparing probability distributions that have nearby or equal location. Notably, it will pass through the point (1/2, 1/2) if and only if the two distributions have the same
median.
P–P plots are sometimes limited to comparisons between two samples, rather than comparison of a sample to a theoretical model distribution.
[Testing for Normality](_blank)
by Henry C. Thode, CRC Press, 2002, ,
Section 2.2.3, Percent–percent plots
p. 23
/ref> However, they are of general use, particularly where observations are not all modelled with the same distribution.
However, it has found some use in comparing a sample distribution from a ''known'' theoretical distribution: given ''n'' samples, plotting the continuous theoretical cdf against the empirical cdf would yield a stairstep (a step as ''z'' hits a sample), and would hit the top of the square when the last data point was hit. Instead one only plots points, plotting the observed ''k''th observed points (in order: formally the observed ''k''th order statistic) against the ''k''/(''n'' + 1) quantile
In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile ...
of the theoretical distribution. This choice of "plotting position" (choice of quantile of the theoretical distribution) has occasioned less controversy than the choice for Q–Q plots. The resulting goodness of fit of the 45° line gives a measure of the difference between a sample set and the theoretical distribution.
A P–P plot can be used as a graphical adjunct to a tests of the fit of probability distributions,[Michael J.R. (1983) "The stabilized probability plot". ]Biometrika
''Biometrika'' is a peer-reviewed scientific journal published by Oxford University Press for thBiometrika Trust The editor-in-chief is Paul Fearnhead ( Lancaster University). The principal focus of this journal is theoretical statistics. It was ...
, 70(1), 11–17. [Shorack, G.R., Wellner, J.A (1986) ''Empirical Processes with Applications to Statistics'', Wiley. p248–250] with additional lines being included on the plot to indicate either specific acceptance regions or the range of expected departure from the 1:1 line. An improved version of the P–P plot, called the SP or S–P plot, is available, which makes use of a variance-stabilizing transformation In applied statistics, a variance-stabilizing transformation is a data transformation that is specifically chosen either to simplify considerations in graphical exploratory data analysis or to allow the application of simple regression-based or an ...
to create a plot on which the variations about the 1:1 line should be the same at all locations.
See also
* Probability plot
References
Citations
Sources
*
{{DEFAULTSORT:P-P Plot
Statistical charts and diagrams