Self-similar processes are types of
stochastic processes
In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that appe ...
that exhibit the phenomenon of
self-similarity. A self-similar phenomenon behaves the same when viewed at different degrees of magnification, or different scales on a dimension (space or time). Self-similar processes can sometimes be described using
heavy-tailed distributions, also known as
long-tailed distributions. Examples of such processes include traffic processes, such as packet inter-arrival times and burst lengths. Self-similar processes can exhibit
long-range dependency.
Overview
The design of robust and reliable networks and network services has become an increasingly challenging task in today's
Internet world. To achieve this goal,
understanding the characteristics of Internet traffic plays a more and more critical
role. Empirical studies of measured traffic traces have led to the wide recognition of
self-similarity in network traffic.
Self-similar
Ethernet traffic exhibits dependencies over a long range of time scales. This is to be contrasted with telephone traffic which is
Poisson in its arrival and departure process.
In traditional Poisson traffic, the short-term fluctuations would average out, and a graph covering a large amount of time would approach a constant value.
Heavy-tailed distributions have been observed in many natural phenomena including both physical and sociological phenomena.
Mandelbrot established the use of heavy-tailed distributions to model real-world
fractal
In mathematics, a fractal is a geometric shape containing detailed structure at arbitrarily small scales, usually having a fractal dimension strictly exceeding the topological dimension. Many fractals appear similar at various scales, as illu ...
phenomena, e.g. Stock markets, earthquakes, climate, and the weather.
Ethernet,
WWW,
SS7,
TCP
TCP may refer to:
Science and technology
* Transformer coupled plasma
* Tool Center Point, see Robot end effector
Computing
* Transmission Control Protocol, a fundamental Internet standard
* Telephony control protocol, a Bluetooth communication s ...
,
FTP,
TELNET and
VBR video (digitised video of the type that is transmitted over
ATM networks) traffic is self-similar.
Self-similarity in packetised data networks can be caused by the distribution of file sizes, human interactions and/ or Ethernet dynamics. Self-similar and long-range dependent characteristics in computer networks present a fundamentally different set of problems to people doing analysis and/or design of networks, and many of the previous assumptions upon which systems have been built are no longer valid in the presence of self-similarity.
The Poisson distribution
Before the heavy-tailed distribution is introduced mathematically, the
Poisson process with a
memoryless waiting-time distribution, used to model (among many things) traditional telephony networks, is briefly reviewed below.
Assuming pure-chance arrivals and pure-chance terminations leads to the following:
*The number of call arrivals in a given time has a Poisson distribution, i.e.:
::
where ''a'' is the number of call arrivals in time ''T'', and
is the mean number of call arrivals in time ''T''. For this reason, pure-chance traffic is also known as Poisson traffic.
*The number of call departures in a given time, also has a Poisson distribution, i.e.:
::
where ''d'' is the number of call departures in time ''T'' and
is the mean number of call departures in time ''T''.
*The intervals, ''T'', between call arrivals and departures are intervals between independent, identically distributed random events. It can be shown that these intervals have a negative exponential distribution, i.e.:
::
where ''h'' is the mean holding time (MHT).
The heavy-tail distribution
A distribution is said to have a heavy tail if
:
One simple example of a heavy-tailed distribution is the
Pareto distribution
The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto ( ), is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actua ...
.
Modelling self-similar traffic
Since (unlike traditional telephony traffic) packetised traffic exhibits self-similar or fractal characteristics, conventional traffic models do not apply to networks which carry self-similar traffic.
With the convergence of voice and data, the future multi-service network will be based on packetised traffic, and models which accurately reflect the nature of self-similar traffic will be required to develop, design and dimension future multi-service networks.
Previous analytic work done in Internet studies adopted assumptions such as exponentially-distributed packet inter-arrivals, and conclusions reached under such assumptions may be misleading or incorrect in the presence of heavy-tailed distributions.
Deriving mathematical models which accurately represent long-range dependent traffic is a fertile area of research.
Self-similar stochastic processes modeled by Tweedie distributions
Leland ''et al'' have provided a mathematical formalism to describe self-similar stochastic processes.
For the sequence of numbers
:
with mean
:
,
deviations
:
,
variance
:
,
and autocorrelation function
:
with lag ''k'', if the
autocorrelation
Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable ...
of this sequence has the long range behavior
:
as ''k'' and where ''L(k)'' is a slowly varying function at large values of ''k'', this sequence is called a self-similar process.
The method of expanding bins can be used to analyze self-similar processes. Consider a set of equal-sized non-overlapping bins that divides the original sequence of ''N'' elements into groups of ''m'' equal-sized segments (''N/m'' is integer) so that new reproductive sequences, based on the mean values, can be defined:
:
.
The variance determined from this sequence will scale as the bin size changes such that
:
if and only if the autocorrelation has the limiting form
[Tsybakov B & Georganas ND (1997) On self-similar traffic in ATM queues: definitions, overflow probability bound, and cell delay distribution. ''IEEE/ACM Trans. Netw.'' 5, 397–409]
:
.
One can also construct a set of corresponding additive sequences
:
,
based on the expanding bins,
:
.
Provided the autocorrelation function exhibits the same behavior, the additive sequences will obey the relationship
:
Since
and
are constants this relationship constitutes a variance-to-mean
power law
In statistics, a power law is a Function (mathematics), functional relationship between two quantities, where a Relative change and difference, relative change in one quantity results in a proportional relative change in the other quantity, inde ...
(
Taylor's law), with ''p''=2-''d''.
Tweedie distributions are a special case of
exponential dispersion models, a class of models used to describe error distributions for the
generalized linear model
In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and b ...
.
These Tweedie distributions are characterized by an inherent
scale invariance and thus for any
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
''Y'' that obeys a Tweedie distribution, the
variance var(''Y'') relates to the
mean E(''Y'') by the power law,
:
where ''a'' and ''p'' are positive constants. The exponent ''p'' for the variance to mean power law associated with certain self-similar stochastic processes ranges between 1 and 2 and thus may be modeled in part by a
Tweedie compound Poisson–gamma distribution.
The additive form of the Tweedie compound Poisson-gamma model has the
cumulant generating function (CGF),
: