ProbOnto is a
knowledge base and
ontology of
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
s.
[Main project website, URL: http://probonto.org] ProbOnto 2.5 (released on January 16, 2017) contains over 150
uni
Uni or UNI may refer to:
Entertainment
*Uni Records, a division of MCA, formally called Universal City Records
*"U.N.I.", a song by Ed Sheeran from ''+'' (''Plus'')
*Uni, a species in the Neopets Trading Card Game
*Uni, a character in the anim ...
- and
multivariate distributions and alternative parameterizations, more than 220 relationships and re-parameterization formulas, supporting also the encoding of empirical and univariate
mixture distributions.
Introduction
ProbOnto was initially designed to facilitate the encoding of
nonlinear-mixed effect models and their annotation in Pharmacometrics Markup Language (PharmML) developed by DDMoRe, an
Innovative Medicines Initiative
The Innovative Medicines Initiative (IMI) is a European initiative to improve the competitive situation of the European Union in the field of pharmaceutical research. The IMI is a joint initiative ( public-private partnership) of the DG Researc ...
project. However, ProbOnto, due to its generic structure can be applied in other platforms and modeling tools for encoding and annotation of diverse models applicable to discrete (e.g.
count, categorical and
time-to-event) and continuous data.
Knowledge base

The knowledge base stores for each distribution:
*
Probability density or
mass functions and where available
cumulative distribution
In statistics, the frequency (or absolute frequency) of an event i is the number n_i of times the observation has occurred/recorded in an experiment or study. These frequencies are often depicted graphically or in tabular form.
Types
The cumul ...
,
hazard
A hazard is a potential source of harm
Harm is a moral and legal concept.
Bernard Gert construes harm as any of the following:
* pain
* death
* disability
* mortality
* loss of abil ity or freedom
* loss of pleasure.
Joel Feinberg giv ...
and
survival functions.
* Related quantities such as mean, median, mode and variance.
*
Parameter and
support
Support may refer to:
Arts, entertainment, and media
* Supporting character
Business and finance
* Support (technical analysis)
* Child support
* Customer support
* Income Support
Construction
* Support (structure), or lateral support, a ...
/range definitions and distribution type.
*
LaTeX and
R code for mathematical functions.
* Model definition and references.
Relationships
ProbOnto stores in Version 2.5 over 220 relationships between univariate distributions with re-parameterizations as a special case, see figure. While this form of relationships is often neglected in literature, and the authors concentrate one a particular form for each distribution, they are crucial from the interoperability point of view. ProbOnto focuses on this aspect and features more than 15 distributions with alternative parameterizations.
Alternative parameterizations
Many distributions are defined with mathematically equivalent but algebraically different formulas. This leads to issues when exchanging models between software tools. The following examples illustrate that.
Normal distribution
Normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
can be defined in at least three ways
* Normal1(μ,σ) with
mean, μ, and
standard deviation
In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...
, σ
[Forbes et al. Probability Distributions (2011), John Wiley & Sons, Inc.]
* Normal2(μ,υ) with mean, μ, and
variance, υ = σ^2 or
* Normal3(μ,τ) with mean, μ, and
precision, τ = 1/υ = 1/σ^2.
= Re-parameterization formulas
=
The following formulas can be used to re-calculate the three different forms of the normal distribution (we use abbreviations i.e.
instead of
etc.)
*
*
*
Log-normal distribution
In the case of the
log-normal distribution there are more options. This is due to the fact that it can be parameterized in terms of parameters on the natural and log scale, see figure.

The available forms in ProbOnto 2.0 are
* LogNormal1(μ,σ) with mean, μ, and standard deviation, σ, both on the log-scale
* LogNormal2(μ,υ) with mean, μ, and variance, υ, both on the log-scale
* LogNormal3(m,σ) with
median
In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic fe ...
, m, on the natural scale and standard deviation, σ, on the log-scale
* LogNormal4(m,cv) with median, m, and
coefficient of variation
In probability theory and statistics, the coefficient of variation (CV), also known as relative standard deviation (RSD), is a standardized measure of dispersion of a probability distribution or frequency distribution. It is often expressed as ...
, cv, both on the natural scale
* LogNormal5(μ,τ) with mean, μ, and precision, τ, both on the log-scale
* LogNormal6(m,σ
g) with median, m, and
geometric standard deviation
In probability theory and statistics, the geometric standard deviation (GSD) describes how spread out are a set of numbers whose preferred average is the geometric mean. For such data, it may be preferred to the more usual standard deviation. Note ...
, σ
g, both on the natural scale
* LogNormal7(μ
N,σ
N) with mean, μ
N, and standard deviation, σ
N, both on the natural scale
ProbOnto knowledge base stores such re-parameterization formulas to allow for a correct translation of models between tools.
= Examples for re-parameterization
=
Consider the situation when one would like to run a model using two different optimal design tools, e.g. PFIM and PopED. The former supports the LN2, the latter LN7 parameterization, respectively. Therefore, the re-parameterization is required, otherwise the two tools would produce different results.
For the transition
following formulas hold
.
For the transition
following formulas hold
.
All remaining re-parameterisation formulas can be found in the specification document on the project website.
Ontology
The knowledge base is built from a simple ontological model. At its core, a probability distribution is an instance of the class thereof, a specialization of the class of mathematical objects. A distribution relates to a number of other individuals, which are instances of various categories in the ontology. For example, these are parameters and related functions associated with a given probability distribution. This strategy allows for the rich representation of attributes and relationships between domain objects. The ontology can be seen as a conceptual schema in the domain of mathematics and has been implemented as a PowerLoom knowledge base. An OWL version is generated programmatically using the Jena API.
Output for ProbOnto are provided as supplementary materials and published on or linked from the probonto.org website. The OWL version of ProbOnto is available via Ontology Lookup Service (OLS) to facilitate simple searching and visualization of the content. In addition the OLS API provides methods to programmatically access ProbOnto and to integrate it into applications. ProbOnto is also registered on the BioSharing portal.
[ProbOnto on BioSharing, the database of biological databases, URL: https://biosharing.org/biodbcore-000772]
ProbOnto in PharmML
A PharmML interface is provided in form of a generic XML schema for the definition of the distributions and their parameters. Defining functions, such as probability density function (PDF), probability mass function (PMF), hazard function (HF) and survival function (SF), can be accessed via methods provided in the PharmML schema.
Use example
This example shows how the zero-inflated Poisson distribution is encoded by using its ''codename'' and declaring that of its parameters (‘rate’ and ‘probabilityOfZero’). Model parameters ''Lambda'' and ''P0'' are assigned to the parameter code names.
To specify any given distribution unambiguously using ProbOnto, it is sufficient to declare its code name and the code names of its parameters.
More examples and a detailed specification can be found on the project website.
See also
*
List of probability distributions
*
Ontology (computer science)
*
Relationships among probability distributions
In probability theory and statistics, there are several relationships among probability distributions. These relations can be categorized in the following groups:
*One distribution is a special case of another with a broader parameter space
*Tr ...
*
Web Ontology Language
References
{{reflist, 30em
External links
ProbOnto websiteUltimate Univariate Probability Distribution Explorer– most likely the largest, free collection of univariate distributions and their features.
UncertML
Probability distributions