Posterior Predictive Distribution

	Posterior Predictive Distribution In Bayesian statistics, the posterior predictive distribution is the distribution of possible unobserved values conditional on the observed values. Given a set of ''N'' i.i.d. observations \mathbf = \, a new value \tilde will be drawn from a distribution that depends on a parameter \theta \in \Theta, where \Theta is the parameter space. :p(\tilde, \theta) It may seem tempting to plug in a single best estimate \hat for \theta, but this ignores uncertainty about \theta, and because a source of uncertainty is ignored, the predictive distribution will be too narrow. Put another way, predictions of extreme values of \tilde will have a lower probability than if the uncertainty in the parameters as given by their posterior distribution is accounted for. A posterior predictive distribution accounts for uncertainty about \theta. The posterior distribution of possible \theta values depends on \mathbf: : p(\theta, \mathbf) And the posterior predictive distribution of \tilde given \ma ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Beta-binomial Distribution In probability theory and statistics, the beta-binomial distribution is a family of discrete probability distributions on a finite support of non-negative integers arising when the probability of success in each of a fixed or known number of Bernoulli trials is either unknown or random. The beta-binomial distribution is the binomial distribution in which the probability of success at each of ''n'' trials is not fixed but randomly drawn from a beta distribution. It is frequently used in Bayesian statistics, empirical Bayes methods and classical statistics to capture overdispersion in binomial type distributed data. The beta-binomial is a one-dimensional version of the Dirichlet-multinomial distribution as the binomial and beta distributions are univariate versions of the multinomial and Dirichlet distributions respectively. The special case where ''α'' and ''β'' are integers is also known as the negative hypergeometric distribution. Motivation and derivation As a compound ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Bayesian Statistics Bayesian statistics ( or ) is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, such as the results of previous experiments, or on personal beliefs about the event. This differs from a number of other interpretations of probability, such as the frequentist interpretation, which views probability as the limit of the relative frequency of an event after many trials. More concretely, analysis in Bayesian methods codifies prior knowledge in the form of a prior distribution. Bayesian statistical methods use Bayes' theorem to compute and update probabilities after obtaining new data. Bayes' theorem describes the conditional probability of an event based on data as well as prior information or beliefs about the event or conditions related to the event. For example, in Bayesian inference, Bayes' theorem can ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Normalization Constant In probability theory, a normalizing constant or normalizing factor is used to reduce any probability function to a probability density function with total probability of one. For example, a Gaussian function can be normalized into a probability density function, which gives the standard normal distribution. In Bayes' theorem, a normalizing constant is used to ensure that the sum of all possible hypotheses equals 1. Other uses of normalizing constants include making the value of a Legendre polynomial at 1 and in the orthogonality of orthonormal functions. A similar concept has been used in areas other than probability, such as for polynomials. Definition In probability theory, a normalizing constant is a constant by which an everywhere non-negative function must be multiplied so the area under its graph is 1, e.g., to make it a probability density function or a probability mass function. Examples If we start from the simple Gaussian function p(x) = e^, \quad x\in(-\infty,\ ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Marginal Likelihood A marginal likelihood is a likelihood function that has been integrated over the parameter space. In Bayesian statistics, it represents the probability of generating the observed sample for all possible values of the parameters; it can be understood as the probability of the model itself and is therefore often referred to as model evidence or simply evidence. Due to the integration over the parameter space, the marginal likelihood does not directly depend upon the parameters. If the focus is not on model comparison, the marginal likelihood is simply the normalizing constant that ensures that the posterior is a proper probability. It is related to the partition function in statistical mechanics. Concept Given a set of independent identically distributed data points \mathbf=(x_1,\ldots,x_n), where x_i \sim p(x, \theta) according to some probability distribution parameterized by \theta, where \theta itself is a random variable described by a distribution, i.e. \theta \sim p(\t ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Likelihood Function A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the joint probability distribution of the random variable that (presumably) generated the observations. When evaluated on the actual data points, it becomes a function solely of the model parameters. In maximum likelihood estimation, the argument that maximizes the likelihood function serves as a point estimate for the unknown parameter, while the Fisher information (often approximated by the likelihood's Hessian matrix at the maximum) gives an indication of the estimate's precision. In contrast, in Bayesian statistics, the estimate of interest is the ''converse'' of the likelihood, the so-called posterior probability of the parameter given the observed data, which is calculated via Bayes' rule. Definition The likelihood function, ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Sufficient Statistic In statistics, sufficiency is a property of a statistic computed on a sample dataset in relation to a parametric model of the dataset. A sufficient statistic contains all of the information that the dataset provides about the model parameters. It is closely related to the concepts of an ancillary statistic which contains no information about the model parameters, and of a complete statistic which only contains information about the parameters and no ancillary information. A related concept is that of linear sufficiency, which is weaker than ''sufficiency'' but can be applied in some cases where there is no sufficient statistic, although it is restricted to linear estimators. The Kolmogorov structure function deals with individual finite data; the related notion there is the algorithmic sufficient statistic. The concept is due to Sir Ronald Fisher in 1920. Stephen Stigler noted in 1973 that the concept of sufficiency had fallen out of favor in descriptive statistics because of ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Conjugate Distributions In Bayesian probability theory, if, given a likelihood function p(x \mid \theta), the posterior distribution p(\theta \mid x) is in the same probability distribution family as the prior probability distribution p(\theta), the prior and posterior are then called conjugate distributions with respect to that likelihood function and the prior is called a conjugate prior for the likelihood function p(x \mid \theta). A conjugate prior is an algebraic convenience, giving a closed-form expression for the posterior; otherwise, numerical integration may be necessary. Further, conjugate priors may clarify how a likelihood function updates a prior distribution. The concept, as well as the term "conjugate prior", were introduced by Howard Raiffa and Robert Schlaifer in their work on Bayesian decision theory.Howard Raiffa and Robert Schlaifer. ''Applied Statistical Decision Theory''. Division of Research, Graduate School of Business Administration, Harvard University, 1961. A similar co ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Multinomial Distribution In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a ''k''-sided die rolled ''n'' times. For ''n'' statistical independence, independent trials each of which leads to a success for exactly one of ''k'' categories, with each category having a given fixed success probability, the multinomial distribution gives the probability of any particular combination of numbers of successes for the various categories. When ''k'' is 2 and ''n'' is 1, the multinomial distribution is the Bernoulli distribution. When ''k'' is 2 and ''n'' is bigger than 1, it is the binomial distribution. When ''k'' is bigger than 2 and ''n'' is 1, it is the categorical distribution. The term "multinoulli" is sometimes used for the categorical distribution to emphasize this four-way relationship (so ''n'' determines the suffix, and ''k'' the prefix). The Bernoulli distribution models the outcome of a si ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Binomial Distribution In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory), experiments, each asking a yes–no question, and each with its own Boolean-valued function, Boolean-valued outcome (probability), outcome: ''success'' (with probability ) or ''failure'' (with probability ). A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., , the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the binomial test of statistical significance. The binomial distribution is frequently used to model the number of successes in a sample of size drawn with replacement from a population of size . If the sampling is carried out without replacement, the draws ar ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Dirichlet-multinomial Distribution In probability theory and statistics, the Dirichlet-multinomial distribution is a family of discrete multivariate probability distributions on a finite support of non-negative integers. It is also called the Dirichlet compound multinomial distribution (DCM) or multivariate Pólya distribution (after George Pólya). It is a compound probability distribution, where a probability vector p is drawn from a Dirichlet distribution with parameter vector \boldsymbol, and an observation drawn from a multinomial distribution with probability vector p and number of trials ''n''. The Dirichlet parameter vector captures the prior belief about the situation and can be seen as a pseudocount: observations of each outcome that occur before the actual data is collected. The compounding corresponds to a Pólya urn scheme. It is frequently encountered in Bayesian statistics, machine learning, empirical Bayes methods and classical statistics as an overdispersed multinomial distribution. It reduces ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Student's T Distribution In probability theory and statistics, Student's distribution (or simply the distribution) t_\nu is a continuous probability distribution that generalizes the standard normal distribution. Like the latter, it is symmetric around zero and bell-shaped. However, t_\nu has heavier tails, and the amount of probability mass in the tails is controlled by the parameter \nu. For \nu = 1 the Student's distribution t_\nu becomes the standard Cauchy distribution, which has very "fat" tails; whereas for \nu \to \infty it becomes the standard normal distribution \mathcal(0, 1), which has very "thin" tails. The name "Student" is a pseudonym used by William Sealy Gosset in his scientific paper publications during his work at the Guinness Brewery in Dublin, Ireland. The Student's distribution plays a role in a number of widely used statistical analyses, including Student's -test for assessing the statistical significance of the difference between two sample means, the cons ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Density Function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can be interpreted as providing a ''relative likelihood'' that the value of the random variable would be equal to that sample. Probability density is the probability per unit length, in other words, while the ''absolute likelihood'' for a continuous random variable to take on any particular value is 0 (since there is an infinite set of possible values to begin with), the value of the PDF at two different samples can be used to infer, in any particular draw of the random variable, how much more likely it is that the random variable would be close to one sample compared to the other sample. More precisely, the PDF is used to specify the probability of the random variable falling ''within ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]