Sample entropy (SampEn) is a modification of

approximate entropy In statistics, an approximate entropy (ApEn) is a technique used to quantify the amount of regularity and the unpredictability of fluctuations over time-series data. For example, consider two series of data: : Series A: (0, 1, 0, 1, 0, 1, 0, 1, 0, 1 ...

(ApEn), used for assessing the complexity of physiological

time-series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...

signals, diagnosing diseased states. SampEn has two advantages over ApEn: data length independence and a relatively trouble-free implementation. Also, there is a small computational difference: In ApEn, the comparison between the template vector (see below) and the rest of the vectors also includes comparison with itself. This guarantees that probabilities

C_'^(r)

are never zero. Consequently, it is always possible to take a logarithm of probabilities. Because template comparisons with itself lower ApEn values, the signals are interpreted to be more regular than they actually are. These self-matches are not included in SampEn. However, since SampEn makes direct use of the correlation integrals, it is not a real measure of information but an approximation. The foundations and differences with ApEn, as well as a step-by-step tutorial for its application is available at. There is a multiscale version of SampEn as well, suggested by Costa and others. SampEn can be used in biomedical and biomechanical research, for example to evaluate postural control.

Definition

(ApEn), Sample entropy (SampEn) is a measure of

complexity Complexity characterises the behaviour of a system or model whose components interact in multiple ways and follow local rules, leading to nonlinearity, randomness, collective dynamics, hierarchy, and emergence. The term is generally used to c ...

. But it does not include self-similar patterns as ApEn does. For a given

embedding In mathematics, an embedding (or imbedding) is one instance of some mathematical structure contained within another instance, such as a group that is a subgroup. When some object X is said to be embedded in another object Y, the embedding is giv ...

dimension In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coor ...

m

tolerance Tolerance or toleration is the state of tolerating, or putting up with, conditionally. Economics, business, and politics * Toleration Party, a historic political party active in Connecticut * Tolerant Systems, the former name of Veritas Software ...

r

and number of

data points In statistics, a unit of observation is the unit described by the data that one analyzes. A study may treat groups as a unit of observation with a country as the unit of analysis, drawing conclusions on group characteristics from data collected a ...

N

, SampEn is the negative natural

logarithm In mathematics, the logarithm is the inverse function to exponentiation. That means the logarithm of a number to the base is the exponent to which must be raised, to produce . For example, since , the ''logarithm base'' 10 of ...

of the

probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...

that if two sets of

simultaneous Simultaneity may refer to: * Relativity of simultaneity, a concept in special relativity. * Simultaneity (music), more than one complete musical texture occurring at the same time, rather than in succession * Simultaneity, a concept in Endogenei ...

data points of length

m

have distance

< r

then two sets of simultaneous data points of length

m+1

also have distance

< r

. And we represent it by

SampEn(m,r,N)

(or by

SampEn(m,r,\tau,N)

including sampling time

\tau

). Now assume we have a

data set of length

N =

with a constant time interval

\tau

. We define a template

vector Vector most often refers to: *Euclidean vector, a quantity with a magnitude and a direction *Vector (epidemiology), an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematic ...

of length

m

, such that

X_m (i)=

and the distance function

d_m(i),X_m(j)

(i≠j) is to be the

Chebyshev distance In mathematics, Chebyshev distance (or Tchebychev distance), maximum metric, or L∞ metric is a metric defined on a vector space where the distance between two vectors is the greatest of their differences along any coordinate dimension. It is ...

(but it could be any distance function, including

Euclidean Euclidean (or, less commonly, Euclidian) is an adjective derived from the name of Euclid, an ancient Greek mathematician. It is the name of: Geometry *Euclidean space, the two-dimensional plane and three-dimensional space of Euclidean geometry ...

distance). We define the sample entropy to be :

SampEn=-\ln

Where

A

= number of template vector pairs having

d_(i),X_(j) < r

B

= number of template vector pairs having

d_m(i),X_m(j) < r

It is clear from the definition that

A

will always have a value smaller or equal to

B

. Therefore,

SampEn(m,r,\tau)

will be always either be zero or positive value. A smaller value of

SampEn

also indicates more

self-similarity __NOTOC__ In mathematics, a self-similar object is exactly or approximately similar to a part of itself (i.e., the whole has the same shape as one or more of the parts). Many objects in the real world, such as coastlines, are statistically s ...

in data set or less noise. Generally we take the value of

m

to be

2

and the value of

r

to be

0.2 \times std

. Where std stands for standard deviation which should be taken over a very large dataset. For instance, the r value of 6 ms is appropriate for sample entropy calculations of heart rate intervals, since this corresponds to

0.2 \times std

for a very large population.

Multiscale SampEn

The definition mentioned above is a special case of multi scale sampEn with

\delta=1

, where

\delta

is called skipping parameter. In multiscale SampEn template vectors are defined with a certain interval between its elements, specified by the value of

\delta

. And modified template vector is defined as

X_(i)=

and sampEn can be written as

SampEn \left ( m,r,\delta \right )=-\ln

And we calculate

A_\delta

and

B_\delta

like before.

Implementation

Sample entropy can be implemented easily in many different programming languages. Below lies a vectorized example written in Python. import numpy as np def sampen(L, m, r): """Sample entropy.""" N = len(L) B = 0.0 A = 0.0 # Split time series and save all templates of length m xmi = np.array( [i : i + mfor i in range(N - m)"> : i + m">[i : i + mfor i in range(N - m) xmj = np.array( [i : i + mfor i in range(N - m + 1)"> : i + m">[i : i + mfor i in range(N - m + 1) # Save all matches minus the self-match, compute B B = np.sum([np.sum(np.abs(xmii - xmj).max(axis=1) <= r) - 1 for xmii in xmi]) # Similar for computing A m += 1 xm = np.array( : i + m">[i : i + mfor i in range(N - m + 1) A = np.sum([np.sum(np.abs(xmi - xm).max(axis=1) <= r) - 1 for xmi in xm"> : i + mfor i in range(N - m + 1)"> : i + m">[i : i + mfor i in range(N - m + 1) A = np.sum([np.sum(np.abs(xmi - xm).max(axis=1) <= r) - 1 for xmi in xm # Return SampEn return -np.log(A / B) An example written in other languages can be found:
Matlab

R

References

{{reflist Statistical signal processing Entropy Articles with example Python (programming language) code

Definition

Multiscale SampEn

Implementation

See also

References