HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
and
probability Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...
, the Neyman Type A distribution is a discrete probability distribution from the family of
Compound Poisson distribution In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. ...
. First of all, to easily understand this distribution we will demonstrate it with the following example explained in Univariate Discret Distributions; we have a statistical model of the distribution of larvae in a unit area of field (in a unit of habitat) by assuming that the variation in the number of clusters of eggs per unit area (per unit of habitat) could be represented by a
Poisson distribution In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...
with parameter \lambda , while the number of larvae developing per cluster of eggs are assumed to have independent
Poisson distribution In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...
all with the same parameter \phi . If we want to know how many larvae there are, we define a
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
''Y'' as the sum of the number of larvae hatched in each group (given ''j'' groups). Therefore, ''Y'' = ''X''1 + ''X''2 + ... ''X'' j, where ''X''1,...,''X''j are independent Poisson variables with parameter \lambda and \phi.


History

Jerzy Neyman Jerzy Spława-Neyman (April 16, 1894 – August 5, 1981; ) was a Polish mathematician and statistician who first introduced the modern concept of a confidence interval into statistical hypothesis testing and, with Egon Pearson, revised Ronald Fis ...
was born in Russia in April 16 of 1894, he was a Polish statistician who spent the first part of his career in Europe. In 1939 he developed the Neyman Type A distribution to describe the distribution of larvae in experimental field plots. Above all, it is used to describe populations based on contagion, e.g.,
entomology Entomology (from Ancient Greek ἔντομον (''éntomon''), meaning "insect", and -logy from λόγος (''lógos''), meaning "study") is the branch of zoology that focuses on insects. Those who study entomology are known as entomologists. In ...
(Beall 940 Evans
953 Year 953 ( CMLIII) was a common year starting on Saturday of the Julian calendar. Events By place Byzantine Empire * Battle of Marash: Emir Sayf al-Dawla marches north into the Byzantine Empire and ravages the countryside of Malatya ...
ref name=Evans>
), accidents (Creswell i Froggatt
963 Year 963 (Roman numerals, CMLXIII) was a common year starting on Thursday of the Julian calendar. Events By place Byzantine Empire * March 15 – Emperor Romanos II dies at age 39, probably of poison administered by his wife, Emp ...
, and
bacteriology Bacteriology is the branch and specialty of biology that studies the Morphology (biology), morphology, ecology, genetics and biochemistry of bacteria as well as many other aspects related to them. This subdivision of microbiology involves the iden ...
. The original derivation of this distribution was on the basis of a biological model and, presumably, it was expected that a good fit to the data would justify the hypothesized model. However, it is now known that it is possible to derive this distribution from different models (
William Feller William "Vilim" Feller (July 7, 1906 – January 14, 1970), born Vilibald Srećko Feller, was a Croatian–American mathematician specializing in probability theory. Early life and education Feller was born in Zagreb to Ida Oemichen-Perc, a Cro ...
943, and in view of this, Neyman's distribution derive as
Compound Poisson distribution In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. ...
. This interpretation makes them suitable for modelling heterogeneous populations and renders them examples of apparent contagion. Despite this, the difficulties in dealing with Neyman's Type A arise from the fact that its expressions for probabilities are highly complex. Even estimations of parameters through efficient methods, such as
maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...
, are tedious and not easy to understand equations.


Definition


Probability generating function

The probability generating function (pgf) ''G''1(''z''), which creates ''N'' independent Xj random variables, is used to a branching process. Each Xj produces a random number of individuals, where X1, X2,... have the same distribution as ''X'', which is that of ''X'' with pgf ''G''2(''z''). The total number of  individuals is then the random variable, : Y = SN = X_1 + X_2 + ... + X_N The p.g.f. of the distribution of ''SN'' is : : E ^= E_N N = E_N[G_2(z)">[z^.html" ;"title="[z^">N = E_N[G_2(z)= G_1(G_2(z)) One of the notations, which is particularly helpful, allows us to use a symbolic representation to refer to  an F1 distribution that has been generalized by an F2 distribution is, : Y\sim F_1\bigwedge F_N In this instance, it is written as, : Y\sim \operatorname\bigwedge \operatorname Finally, the
probability generating function In probability theory, the probability generating function of a discrete random variable is a power series representation (the generating function) of the probability mass function of the random variable. Probability generating functions are of ...
is, : G_Y(z) = \exp(\lambda(e^-1)) From the generating function of probabilities we can calculate the
probability mass function In probability and statistics, a probability mass function (sometimes called ''probability function'' or ''frequency function'') is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes i ...
explained below.


Probability mass function

Let ''X''1,''X''2,...''X''j be Poisson
independent variables A variable is considered dependent if it depends on (or is hypothesized to depend on) an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical function ...
. The
probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
of the
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
''Y'' = ''X''1 +''X''2+...''X''j is the Neyman's Type A distribution with parameters \lambda and \phi. : p_x = P(Y=x) = \frac \sum_^ \frac~~~~x = 1,2,... Alternatively, : p_x = P(Y=x) = \frac \sum_^S(x,j)\lambda^j e^ In order to see how the previous expression develops, we must bear in mind that the
probability mass function In probability and statistics, a probability mass function (sometimes called ''probability function'' or ''frequency function'') is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes i ...
is calculated from the
probability generating function In probability theory, the probability generating function of a discrete random variable is a power series representation (the generating function) of the probability mass function of the random variable. Probability generating functions are of ...
, and use the property of Stirling Numbers. Let's see the development : G(z) = e^\sum_^\frac : = e^\sum_^ (\lambda e^)^j \sum_^\frac : = \frac \sum_^S(x,j)\lambda^j e^ Another form to estimate the probabilities is with recurring successions, :p_x = P(Y=x) = \frac \sum_^ \frac p_~~,~~p_0 = \exp(-\lambda + \lambda e ^) Although its length varies directly with ''n'', this recurrence relation is only employed for numerical computation and is particularly useful for computer applications. where * ''x'' = 0, 1, 2, ... , except for probabilities of recurring successions, where ''x'' = 1, 2, 3, ... * j \leq x * \lambda , \phi>0 . * ''x''! and ''j''! are the
factorial In mathematics, the factorial of a non-negative denoted is the Product (mathematics), product of all positive integers less than or equal The factorial also equals the product of n with the next smaller factorial: \begin n! &= n \times ...
s of ''x'' and ''j'', respectively. * one of the properties of
Stirling numbers of the second kind In mathematics, particularly in combinatorics, a Stirling number of the second kind (or Stirling partition number) is the number of ways to partition a set of ''n'' objects into ''k'' non-empty subsets and is denoted by S(n,k) or \textstyle \lef ...
is as follows: : (e^ - 1)^j = j! \sum_^\infty \frac


Notation

:Y\ \sim \operatorname(\lambda,\phi)\,


Properties


Moment and cumulant generating functions

The
moment generating function In probability theory and statistics, the moment-generating function of a real-valued random variable is an alternative specification of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compare ...
of a random variable ''X'' is defined as the expected value of ''e''''t'', as a function of the real parameter ''t''. For an \operatorname, the
moment generating function In probability theory and statistics, the moment-generating function of a real-valued random variable is an alternative specification of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compare ...
exists and is equal to : M(t) = G_Y(e^t) = \exp(\lambda(e^-1)) The
cumulant generating function In probability theory and statistics, the cumulants of a probability distribution are a set of quantities that provide an alternative to the '' moments'' of the distribution. Any two probability distributions whose moments are identical will have ...
is the
logarithm In mathematics, the logarithm of a number is the exponent by which another fixed value, the base, must be raised to produce that number. For example, the logarithm of to base is , because is to the rd power: . More generally, if , the ...
of the
moment generating function In probability theory and statistics, the moment-generating function of a real-valued random variable is an alternative specification of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compare ...
and is equal to : K(t) = \log(M(t)) = \lambda(e^-1) In the following table we can see the moments of the order from 1 to 4


Skewness

The
skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal ...
is the third moment centered around the
mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
divided by the 3/2 power of the
standard deviation In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...
, and for the \operatorname distribution is, :\gamma_1 = \frac = \frac


Kurtosis

The
kurtosis In probability theory and statistics, kurtosis (from , ''kyrtos'' or ''kurtos'', meaning "curved, arching") refers to the degree of “tailedness” in the probability distribution of a real-valued random variable. Similar to skewness, kurtos ...
is the fourth moment centered around the
mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
, divided by the square of the
variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
, and for the \operatorname distribution is, :\beta_2= \frac = \frac = \frac + 3 The
excess kurtosis In probability theory and statistics, kurtosis (from , ''kyrtos'' or ''kurtos'', meaning "curved, arching") refers to the degree of “tailedness” in the probability distribution of a real-valued random variable. Similar to skewness, kurtosi ...
is just a correction to make the kurtosis of the
normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...
equal to zero, and it is the following, :\gamma_2= \frac-3 = \frac *Always \beta_2 >3, or \gamma_2 >0 the distribution has a high acute peak around the
mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
and fatter tails.


Characteristic function

In a
discrete distribution In probability theory and statistics, a probability distribution is a function that gives the probabilities of occurrence of possible events for an experiment. It is a mathematical description of a random phenomenon in terms of its sample spac ...
the
characteristic function In mathematics, the term "characteristic function" can refer to any of several distinct concepts: * The indicator function of a subset, that is the function \mathbf_A\colon X \to \, which for a given subset ''A'' of ''X'', has value 1 at points ...
of any real-valued random variable is defined as the
expected value In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...
of e^, where ''i'' is the imaginary unit and ''t'' ∈ ''R'' :\phi(t)= E ^= \sum_^\infty e^P =j/math> This function is related to the
moment generating function In probability theory and statistics, the moment-generating function of a real-valued random variable is an alternative specification of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compare ...
via \phi_x(t) = M_X(it). Hence for this distribution the
characteristic function In mathematics, the term "characteristic function" can refer to any of several distinct concepts: * The indicator function of a subset, that is the function \mathbf_A\colon X \to \, which for a given subset ''A'' of ''X'', has value 1 at points ...
is, :\phi_x(t) = \exp(\lambda(e^-1)) *Note that the symbol \phi_x is used to represent the characteristic function.


Cumulative distribution function

The
cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ever ...
is, : \begin F(x;\lambda,\phi)& = P(Y \leq x)\\ &= e^ \sum_^ \frac \sum_^ \frac\\ &= e^ \sum_^ \sum_^ \frac \end


Other properties

* The
index of dispersion In probability theory and statistics, the index of dispersion, dispersion index, coefficient of dispersion, relative variance, or variance-to-mean ratio (VMR), like the coefficient of variation, is a normalized measure of the dispersion of a pro ...
is a normalized measure of the dispersion of a
probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
. It is defined as the ratio of the
variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
\sigma^2 to the
mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
\mu, : d = \frac = 1 + \phi * From a sample of size ''N'', where each random variable ''Y''i comes from a \operatorname, where ''Y''1, ''Y''2, .., ''Y''n are independent. This gives the MLE estimator as, : \sum_^ \frac = \bar = \bar\bar ~~~ where \mu is the poblational mean of \bar * Between the two earlier expressions We are able to parametrize using \mu and d , : \begin \mu = \lambda\phi \\ d = 1 + \phi \end \longrightarrow \quad\! \begin \lambda = \frac \\ \phi = d -1 \end


Parameter estimation


Method of moments

The
mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
and the
variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
of the NA(\lambda,\phi) are \mu = \lambda\phi and \lambda\phi(1 + \phi), respectively. So we have these two equations, : \begin \bar = \lambda\phi \\ s^2 = \lambda\phi(1 + \phi) \end * s^2 and \bar are the mostral variance and mean respectively. Solving these two equations we get the moment estimators \hat and \hat of \lambda and \phi. :\bar = \frac :\bar = \frac


Maximum likelihood

Calculating the maximum likelihood estimator of \lambda and \phi involves multiplying all the probabilities in the
probability mass function In probability and statistics, a probability mass function (sometimes called ''probability function'' or ''frequency function'') is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes i ...
to obtain the expression \mathcal(\lambda,\phi;x_1,\ldots,x_n). When we apply the parameterization adjustment defined in "Other Properties," we get \mathcal(\mu, d;X). We may define the
Maximum likelihood estimation In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...
based on a single parameter if we estimate the \mu as the \bar (sample mean) given a sample ''X'' of size ''N''. We can see it below. :\mathcal(d;X)= \prod_^n P(x_i;d) *To estimate the probabilities, we will use the p.m.f. of recurring successions, so that the calculation is less complex.


Testing Poisson assumption

When \operatorname is used to simulate a data sample it is important to see if the
Poisson distribution In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...
fits the data well. For this, the following Hypothesis test is used: : \begin H_0: d=1 \\ H_1: d> 1 \end


Likelihood-ratio test

The
likelihood-ratio test In statistics, the likelihood-ratio test is a hypothesis test that involves comparing the goodness of fit of two competing statistical models, typically one found by maximization over the entire parameter space and another found after imposing ...
statistic for \operatorname is, :W = 2(\mathcal(X;\mu,d)-\mathcal(X;\mu,1)) Where likelihood \mathcal() is the log-likelihood function. ''W'' does not have an asymptotic \chi_1^2 distribution as expected under the null hypothesis since ''d'' = 1 is at the parameter domain's edge. In the asymptotic distribution of ''W,'' it can be demonstrated that the constant 0 and \chi_1^2 have a 50:50 mixture. For this mixture, the \alpha upper-tail percentage points are the same as the 2\alpha upper-tail percentage points for a \chi_1^2


Related distributions

The
poisson distribution In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...
(on ) is a special case of the Neyman Type A distribution, with ::\operatorname(\lambda) = \operatorname(\lambda,\, 0).\, From moments of order 1 and 2 we can write the population mean and variance based on the parameters \lambda and \phi. :\mu = \lambda\phi :\sigma^2 = \lambda\phi (1 + \phi) In the dispersion index ''d'' we observe that by substituting \mu for the parametrized equation of order 1 and \sigma for the one of order 2, we obtain d = 1 + \phi . Our variable ''Y'' is therefore distributed as a Poisson of parameter \lambda when ''d'' approaches to 1. Then we have that, \lim_\operatorname(\lambda,\, \phi) \rightarrow \operatorname(\lambda)


Applications


Usage History

When the Neyman's type A species' reproduction results in
clusters may refer to: Science and technology Astronomy * Cluster (spacecraft), constellation of four European Space Agency spacecraft * Cluster II (spacecraft), a European Space Agency mission to study the magnetosphere * Asteroid cluster, a small ...
, a distribution has been used to characterize the dispersion of plants. This typically occurs when a species develops from offspring of parent plants or from seeds that fall close to the parent plant. However, Archibald (1948] observed that there is insufficient data to infer the kind of reproduction from the type of fitted distribution. While Neyman type A produced positive results for plant distributions, Evans (1953] showed that
Negative binomial distribution In probability theory and statistics, the negative binomial distribution, also called a Pascal distribution, is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Berno ...
produced positive results for insect distributions. Neyman type A distributions have also been studied in the context of
ecology Ecology () is the natural science of the relationships among living organisms and their Natural environment, environment. Ecology considers organisms at the individual, population, community (ecology), community, ecosystem, and biosphere lev ...
, and the results that unless the plant clusters are so compact as to not lie across the edge of the square used to pick sample locations, the distribution is unlikely to be applicable to plant populations. The compactness of the clusters is a hidden assumption in Neyman's original derivation of the distribution, according to Skellam (1958). The results were shown to be significantly impacted by the square size selection. In the context of bus driver accidents, Cresswell and Froggatt (1963) derived the Neyman type A based on the following hypotheses: # Each driver is susceptible to "spells," the number of which is Poissonian for any given amount of time, with the same parameter \lambda for all drivers. # A driver's performance during a spell is poor, and he is likely to experience a Poissonian number of collisions, with the same parameter \phi for all drivers. # Each driver behaves independently. # No accidents can occur outside of a spell. These assumptions lead to a Neyman type A distribution via the \operatorname and \operatorname model. In contrast to their "short distribution," Cresswell and Froggatt called this one the "long distribution" because of its lengthy tail. According to Irwin(1964), a type A distribution can also be obtained by assuming that various drivers have different levels of proneness, or ''K''. with probability: :\operatorname = \frac * taking values ~ 0,~ \phi, ~ 2\phi, ~ 3\phi, ~ ... and that a driver with proneness ''k''\phi has ''X'' accidents where: :~\operatorname = \frac This is the ~\operatorname \wedge \operatorname model with mixing over the values taken by ''K''. Distribution was also suggested in the application for the grouping of minority tents for food from 1965 to 1969. In this regard, it was predicted that only the clustering rates or the average number of entities per grouping needed to be approximated, rather than adjusting distributions on d very big data bases. * The reference of the usage history.


Calculating Neyman Type A probabilities in R

* The code below simulates 5000 instances of \operatorname, rNeymanA <- function(n,lambda, phi) * The mass function of recurring probabilities is implemented in R in order to estimate theoretical probabilities; we can see it below, dNeyman.rec <- function(x, lambda, phi) We compare results between the relative frequencies obtained by the simulation and the probabilities computed by the p.m.f. . Given two values for the parameters \lambda = 2 and \phi= 1. It is displayed in the following table,


References

{{ProbDistributions, discrete-infinite Discrete distributions Poisson distribution Compound probability distributions