ProbOnto is a

knowledge base In computer science, a knowledge base (KB) is a set of sentences, each sentence given in a knowledge representation language, with interfaces to tell new sentences and to ask questions about what is known, where either of these interfaces migh ...

and ontology of

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

s.Main project website, URL: http://probonto.org ProbOnto 2.5 (released on January 16, 2017) contains over 150 uni- and

multivariate distribution Multivariate is the quality of having multiple variables. It may also refer to: In mathematics * Multivariable calculus * Multivariate function * Multivariate polynomial * Multivariate interpolation * Multivariate optimization In computing * ...

s and alternative parameterizations, more than 220 relationships and re-parameterization formulas, supporting also the encoding of empirical and univariate

mixture distribution In probability and statistics, a mixture distribution is the probability distribution of a random variable that is derived from a collection of other random variables as follows: first, a random variable is selected by chance from the collection a ...

Introduction

ProbOnto was initially designed to facilitate the encoding of nonlinear-mixed effect models and their annotation in Pharmacometrics Markup Language (PharmML) developed by DDMoRe, an

Innovative Medicines Initiative The Innovative Medicines Initiative (IMI) is a European initiative to improve the competitive situation of the European Union in the field of pharmaceutical research. The IMI is a joint initiative ( public-private partnership) of the DG Resear ...

project. However, ProbOnto, due to its generic structure can be applied in other platforms and modeling tools for encoding and annotation of diverse models applicable to discrete (e.g.

count Count (feminine: countess) is a historical title of nobility in certain European countries, varying in relative status, generally of middling rank in the hierarchy of nobility. Pine, L. G. ''Titles: How the King Became His Majesty''. New York: ...

, categorical and time-to-event) and continuous data.

Knowledge base

The knowledge base stores for each distribution: *

Probability density In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values ...

mass Mass is an Intrinsic and extrinsic properties, intrinsic property of a physical body, body. It was traditionally believed to be related to the physical quantity, quantity of matter in a body, until the discovery of the atom and particle physi ...

functions and where available cumulative distribution,

hazard A hazard is a potential source of harm. Substances, events, or circumstances can constitute hazards when their nature would potentially allow them to cause damage to health, life, property, or any other interest of value. The probability of that ...

and

survival Survival or survivorship, the act of surviving, is the propensity of something to continue existing, particularly when this is done despite conditions that might kill or destroy it. The concept can be applied to humans and other living things ...

functions. * Related quantities such as mean, median, mode and variance. *

Parameter A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...

and

support Support may refer to: Arts, entertainment, and media * Supporting character * Support (art), a solid surface upon which a painting is executed Business and finance * Support (technical analysis) * Child support * Customer support * Income Su ...

/range definitions and distribution type. *

LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latices are found in nature, but synthetic latices are common as well. In nature, latex is found as a wikt:milky, milky fluid, which is present in 10% of all floweri ...

and R code for mathematical functions. * Model definition and references.

Relationships

ProbOnto stores in Version 2.5 over 220 relationships between univariate distributions with re-parameterizations as a special case, see figure. While this form of relationships is often neglected in literature, and the authors concentrate one a particular form for each distribution, they are crucial from the interoperability point of view. ProbOnto focuses on this aspect and features more than 15 distributions with alternative parameterizations.

Alternative parameterizations

Many distributions are defined with mathematically equivalent but algebraically different formulas. This leads to issues when exchanging models between software tools. The following examples illustrate that.

Normal distribution

Normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...

can be defined in at least three ways * Normal1(μ,σ) with

mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...

, μ, and

standard deviation In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...

, σ Forbes et al. Probability Distributions (2011), John Wiley & Sons, Inc.

P(x;\boldsymbol\mu,\boldsymbol\sigma)= \frac\exp\Big \frac\Big /math>

* Normal2(μ,υ) with mean, μ, and

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

, υ = σ^2 or

P(x;\boldsymbol\mu,\boldsymbol v)= \frac\exp\Big \frac\Big /math>

* Normal3(μ,τ) with mean, μ, and

precision Precision, precise or precisely may refer to: Arts and media * ''Precision'' (march), the official marching music of the Royal Military College of Canada * "Precision" (song), by Big Sean * ''Precisely'' (sketch), a dramatic sketch by the Eng ...

, τ = 1/υ = 1/σ^2.

P(x;\boldsymbol\mu,\boldsymbol\tau)= \sqrt \exp\Big \frac(x-\mu)^2\Big /math>

= Re-parameterization formulas

= The following formulas can be used to re-calculate the three different forms of the normal distribution (we use abbreviations i.e.

N1

instead of

Normal1

etc.) *

N1(\mu,\sigma) \rightarrow N2(\mu,v): v=\sigma^2 \mbox N2(\mu,v) \rightarrow N1(\mu,\sigma): \sigma=\sqrt;

N1(\mu,\sigma) \rightarrow N3(\mu,\tau): \tau=1/\sigma^2 \mbox N3(\mu,\tau) \rightarrow N1(\mu,\sigma): \sigma=1/\sqrt;

N2(\mu,v) \rightarrow N3(\mu,\tau): \tau=1/v \mbox N3(\mu,\tau) \rightarrow N2(\mu,v): v=1/\tau.

Log-normal distribution

In the case of the

log-normal distribution In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normal distribution, normally distributed. Thus, if the random variable is log-normally distributed ...

there are more options. This is due to the fact that it can be parameterized in terms of parameters on the natural and log scale, see figure. LogNormal17

The available forms in ProbOnto 2.0 are * LogNormal1(μ,σ) with mean, μ, and standard deviation, σ, both on the log-scale

P(x;\boldsymbol\mu,\boldsymbol\sigma)= \frac \exp\Big \frac\Big /math>

* LogNormal2(μ,υ) with mean, μ, and variance, υ, both on the log-scale

P(x;\boldsymbol\mu,\boldsymbol )=\frac \exp\Big \frac\Big /math>

* LogNormal3(m,σ) with

median The median of a set of numbers is the value separating the higher half from the lower half of a Sample (statistics), data sample, a statistical population, population, or a probability distribution. For a data set, it may be thought of as the “ ...

, m, on the natural scale and standard deviation, σ, on the log-scale

P(x;\boldsymbol m,\boldsymbol \sigma) =\frac \exp\Big \frac\Big /math>

* LogNormal4(m,cv) with median, m, and

coefficient of variation In probability theory and statistics, the coefficient of variation (CV), also known as normalized root-mean-square deviation (NRMSD), percent RMS, and relative standard deviation (RSD), is a standardized measure of dispersion of a probability ...

, cv, both on the natural scale

P(x;\boldsymbol m,\boldsymbol )= \frac \exp\Big \frac\Big /math>

* LogNormal5(μ,τ) with mean, μ, and precision, τ, both on the log-scale

P(x;\boldsymbol\mu,\boldsymbol \tau)=\sqrt \frac \exp\Big \Big

* LogNormal6(m,σ_g) with median, m, and

geometric standard deviation In probability theory and statistics, the geometric standard deviation (GSD) describes how spread out are a set of numbers whose preferred average is the geometric mean. For such data, it may be preferred to the more usual standard deviation. Not ...

, σ_g, both on the natural scale

P(x;\boldsymbol m,\boldsymbol )=\frac \exp\Big \frac\Big /math>

* LogNormal7(μ_N,σ_N) with mean, μ_N, and standard deviation, σ_N, both on the natural scale

P(x;\boldsymbol ,\boldsymbol )= \frac \exp\Bigg( \frac\Bigg)

ProbOnto knowledge base stores such re-parameterization formulas to allow for a correct translation of models between tools.

= Examples for re-parameterization

= Consider the situation when one would like to run a model using two different optimal design tools, e.g. PFIM and PopED. The former supports the LN2, the latter LN7 parameterization, respectively. Therefore, the re-parameterization is required, otherwise the two tools would produce different results. For the transition

LN2(\mu, v) \rightarrow LN7(\mu_N, \sigma_N)

following formulas hold

\mu_N = \exp(\mu+v/2) \text \sigma_N = \exp(\mu+v/2)\sqrt

. For the transition

LN7(\mu_N, \sigma_N) \rightarrow LN2(\mu, v)

following formulas hold

\mu = \log\Big( \mu_N/\sqrt \Big) \text v = \log(1+\sigma_N^2/\mu_N^2)

. All remaining re-parameterisation formulas can be found in the specification document on the project website.

Ontology

The knowledge base is built from a simple ontological model. At its core, a probability distribution is an instance of the class thereof, a specialization of the class of mathematical objects. A distribution relates to a number of other individuals, which are instances of various categories in the ontology. For example, these are parameters and related functions associated with a given probability distribution. This strategy allows for the rich representation of attributes and relationships between domain objects. The ontology can be seen as a conceptual schema in the domain of mathematics and has been implemented as a PowerLoom knowledge base. An OWL version is generated programmatically using the Jena API. Output for ProbOnto are provided as supplementary materials and published on or linked from the probonto.org website. The OWL version of ProbOnto is available via Ontology Lookup Service (OLS) to facilitate simple searching and visualization of the content. In addition the OLS API provides methods to programmatically access ProbOnto and to integrate it into applications. ProbOnto is also registered on the BioSharing portal.ProbOnto on BioSharing, the database of biological databases, URL: https://biosharing.org/biodbcore-000772

ProbOnto in PharmML

A PharmML interface is provided in form of a generic XML schema for the definition of the distributions and their parameters. Defining functions, such as probability density function (PDF), probability mass function (PMF), hazard function (HF) and survival function (SF), can be accessed via methods provided in the PharmML schema.

Use example

This example shows how the zero-inflated Poisson distribution is encoded by using its ''codename'' and declaring that of its parameters (‘rate’ and ‘probabilityOfZero’). Model parameters ''Lambda'' and ''P0'' are assigned to the parameter code names. To specify any given distribution unambiguously using ProbOnto, it is sufficient to declare its code name and the code names of its parameters. More examples and a detailed specification can be found on the project website.

References

External links

* {{Official website
Leemis chart

Ultimate Univariate Probability Distribution Explorer
– most likely the largest, free collection of univariate distributions and their features.
UncertML
Probability distributions