Self-similar process
   HOME

TheInfoList



OR:

Self-similar processes are types of stochastic processes that exhibit the phenomenon of
self-similarity __NOTOC__ In mathematics, a self-similar object is exactly or approximately similar to a part of itself (i.e., the whole has the same shape as one or more of the parts). Many objects in the real world, such as coastlines, are statistically se ...
. A self-similar phenomenon behaves the same when viewed at different degrees of magnification, or different scales on a dimension (space or time). Self-similar processes can sometimes be described using
heavy-tailed distribution In probability theory, heavy-tailed distributions are probability distributions whose tails are not exponentially bounded: that is, they have heavier tails than the exponential distribution. In many applications it is the right tail of the distr ...
s, also known as
long-tailed distribution In statistics and business, a long tail of some distributions of numbers is the portion of the distribution having many occurrences far from the "head" or central part of the distribution. The distribution could involve popularities, random nu ...
s. Examples of such processes include traffic processes, such as packet inter-arrival times and burst lengths. Self-similar processes can exhibit
long-range dependency Long-range dependence (LRD), also called long memory or long-range persistence, is a phenomenon that may arise in the analysis of spatial or time series data. It relates to the rate of decay of statistical dependence of two points with increasing t ...
.


Overview

The design of robust and reliable networks and network services has become an increasingly challenging task in today's
Internet The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a '' network of networks'' that consists of private, pub ...
world. To achieve this goal, understanding the characteristics of Internet traffic plays a more and more critical role. Empirical studies of measured traffic traces have led to the wide recognition of self-similarity in network traffic. Self-similar
Ethernet Ethernet () is a family of wired computer networking technologies commonly used in local area networks (LAN), metropolitan area networks (MAN) and wide area networks (WAN). It was commercially introduced in 1980 and first standardized in 1 ...
traffic exhibits dependencies over a long range of time scales. This is to be contrasted with telephone traffic which is Poisson in its arrival and departure process. In traditional Poisson traffic, the short-term fluctuations would average out, and a graph covering a large amount of time would approach a constant value. Heavy-tailed distributions have been observed in many natural phenomena including both physical and sociological phenomena. Mandelbrot established the use of heavy-tailed distributions to model real-world fractal phenomena, e.g. Stock markets, earthquakes, climate, and the weather. Ethernet,
WWW The World Wide Web (WWW), commonly known as the Web, is an information system enabling documents and other web resources to be accessed over the Internet. Documents and downloadable media are made available to the network through web se ...
, SS7, TCP,
FTP The File Transfer Protocol (FTP) is a standard communication protocol used for the transfer of computer files from a server to a client on a computer network. FTP is built on a client–server model architecture using separate control and data ...
,
TELNET Telnet is an application protocol used on the Internet or local area network to provide a bidirectional interactive text-oriented communication facility using a virtual terminal connection. User data is interspersed in-band with Telnet contr ...
and VBR video (digitised video of the type that is transmitted over ATM networks) traffic is self-similar. Self-similarity in packetised data networks can be caused by the distribution of file sizes, human interactions and/ or Ethernet dynamics. Self-similar and long-range dependent characteristics in computer networks present a fundamentally different set of problems to people doing analysis and/or design of networks, and many of the previous assumptions upon which systems have been built are no longer valid in the presence of self-similarity.


The Poisson distribution

Before the heavy-tailed distribution is introduced mathematically, the
Poisson process In probability, statistics and related fields, a Poisson point process is a type of random mathematical object that consists of points randomly located on a mathematical space with the essential feature that the points occur independently of one ...
with a
memoryless In probability and statistics, memorylessness is a property of certain probability distributions. It usually refers to the cases when the distribution of a "waiting time" until a certain event does not depend on how much time has elapsed already ...
waiting-time distribution, used to model (among many things) traditional telephony networks, is briefly reviewed below. Assuming pure-chance arrivals and pure-chance terminations leads to the following: *The number of call arrivals in a given time has a Poisson distribution, i.e.: :: P(a)= \left ( \frac \right )e^, where ''a'' is the number of call arrivals in time ''T'', and \mu is the mean number of call arrivals in time ''T''. For this reason, pure-chance traffic is also known as Poisson traffic. *The number of call departures in a given time, also has a Poisson distribution, i.e.: :: P(d)=\left(\frac\right)e^, where ''d'' is the number of call departures in time ''T'' and \lambda is the mean number of call departures in time ''T''. *The intervals, ''T'', between call arrivals and departures are intervals between independent, identically distributed random events. It can be shown that these intervals have a negative exponential distribution, i.e.: :: P \ge \ te^,\, where ''h'' is the mean holding time (MHT).


The heavy-tail distribution

A distribution is said to have a heavy tail if : \lim_ e^\Pr >x= \infty \quad \mbox \lambda>0.\, One simple example of a heavy-tailed distribution is the Pareto distribution.


Modelling self-similar traffic

Since (unlike traditional telephony traffic) packetised traffic exhibits self-similar or fractal characteristics, conventional traffic models do not apply to networks which carry self-similar traffic. With the convergence of voice and data, the future multi-service network will be based on packetised traffic, and models which accurately reflect the nature of self-similar traffic will be required to develop, design and dimension future multi-service networks. Previous analytic work done in Internet studies adopted assumptions such as exponentially-distributed packet inter-arrivals, and conclusions reached under such assumptions may be misleading or incorrect in the presence of heavy-tailed distributions. Deriving mathematical models which accurately represent long-range dependent traffic is a fertile area of research.


Self-similar stochastic processes modeled by

Tweedie distributions In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal, gamma and inverse Gaussian distributions, the purely discrete scaled Poisson distribution, and the cla ...

Leland ''et al'' have provided a mathematical formalism to describe self-similar stochastic processes. For the sequence of numbers : Y=(Y_i :i=0,1,2,...,N) with mean : \hat=\text(Y_i), deviations : y_i = Y_i - \hat , variance : \hat^2=\text(y_i^2), and autocorrelation function : r(k)=\text(y_i,y_)/\text(y_i^2) with lag ''k'', if the autocorrelation of this sequence has the long range behavior : r(k)\sim k^ L(k) as ''k'' and where ''L(k)'' is a slowly varying function at large values of ''k'', this sequence is called a self-similar process. The method of expanding bins can be used to analyze self-similar processes. Consider a set of equal-sized non-overlapping bins that divides the original sequence of ''N'' elements into groups of ''m'' equal-sized segments (''N/m'' is integer) so that new reproductive sequences, based on the mean values, can be defined: : Y_i^=(Y_+...+Y_)/m. The variance determined from this sequence will scale as the bin size changes such that : \text ^\hat^2 m^ if and only if the autocorrelation has the limiting formTsybakov B & Georganas ND (1997) On self-similar traffic in ATM queues: definitions, overflow probability bound, and cell delay distribution. ''IEEE/ACM Trans. Netw.'' 5, 397–409 : \lim_r(k)/k^ = (2-d)(1-d)/2. One can also construct a set of corresponding additive sequences : Z_i^ = mY_i^, based on the expanding bins, : Z_i^=(Y_+...+Y_). Provided the autocorrelation function exhibits the same behavior, the additive sequences will obey the relationship : \text _i^m^2 \text ^(\hat^2 /\hat^)\text _i^ Since \hat and \hat^2 are constants this relationship constitutes a variance-to-mean power law (
Taylor's law Taylor's power law is an empirical law in ecology that relates the variance of the number of individuals of a species per unit area of habitat to the corresponding mean by a power law relationship. It is named after the ecologist who first propos ...
), with ''p''=2-''d''. Tweedie distributions are a special case of
exponential dispersion model In probability and statistics, the class of exponential dispersion models (EDM) is a set of probability distributions that represents a generalisation of the natural exponential family.Jørgensen, B. (1987). Exponential dispersion models (with dis ...
s, a class of models used to describe error distributions for the generalized linear model. These Tweedie distributions are characterized by an inherent
scale invariance In physics, mathematics and statistics, scale invariance is a feature of objects or laws that do not change if scales of length, energy, or other variables, are multiplied by a common factor, and thus represent a universality. The technical term ...
and thus for any random variable ''Y'' that obeys a Tweedie distribution, the
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
var(''Y'') relates to the
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the '' ari ...
E(''Y'') by the power law, : \text\,(Y) = a text\,(Y)p , where ''a'' and ''p'' are positive constants. The exponent ''p'' for the variance to mean power law associated with certain self-similar stochastic processes ranges between 1 and 2 and thus may be modeled in part by a Tweedie compound Poisson–gamma distribution. The additive form of the Tweedie compound Poisson-gamma model has the cumulant generating function (CGF), : K^*_p(s;\theta,\lambda) = \lambda\kappa_p(\theta) 1+s/\theta)^\alpha-1/math>, where : \kappa_p(\theta) = \dfrac \left(\dfrac\right)^\alpha, is the
cumulant In probability theory and statistics, the cumulants of a probability distribution are a set of quantities that provide an alternative to the '' moments'' of the distribution. Any two probability distributions whose moments are identical will have ...
function, ''α'' is the Tweedie exponent : \alpha=\dfrac, ''s'' is the generating function variable, ''θ'' is the canonical parameter and ''λ'' is the index parameter. The first and second derivatives of the CGF, with ''s=0'', yields the mean and variance, respectively. One can thus confirm that for the additive models the variance relates to the mean by the power law, : \mathrm (Z)\propto \mathrm(Z)^p. Whereas this Tweedie compound Poisson-gamma CGF will represent the
probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) ca ...
for certain self-similar stochastic processes, it does not return information regarding the long range correlations inherent to the sequence ''Y''. Nonetheless, the Tweedie distributions provide a means understand the possible origins of self-similar stochastic processes for reason of their role as foci for a central limit-like
convergence Convergence may refer to: Arts and media Literature *''Convergence'' (book series), edited by Ruth Nanda Anshen *Convergence (comics), "Convergence" (comics), two separate story lines published by DC Comics: **A four-part crossover storyline that ...
effect known as the Tweedie convergence theorem. In nontechnical terms this theorem tells us that any exponential dispersion model that asymptotically manifests a variance-to-mean power law is required to have a variance function that comes within the domain of attraction of a Tweedie model. The Tweedie convergence theorem can be used to explain the origin of the variance to mean power law, ''1/f'' noise and multifractality, features associated with self-similar processes.


Network performance

Network performance degrades gradually with increasing self-similarity. The more self-similar the traffic, the longer the queue size. The queue length distribution of self-similar traffic decays more slowly than with Poisson sources. However, long-range dependence implies nothing about its short-term correlations which affect performance in small buffers. Additionally, aggregating streams of self-similar traffic typically intensifies the self-similarity ("burstiness") rather than smoothing it, compounding the problem. Self-similar traffic exhibits the persistence of
cluster may refer to: Science and technology Astronomy * Cluster (spacecraft), constellation of four European Space Agency spacecraft * Asteroid cluster, a small asteroid family * Cluster II (spacecraft), a European Space Agency mission to study t ...
ing which has a negative impact on network performance. *With Poisson traffic (found in conventional
telephony Telephony ( ) is the field of technology involving the development, application, and deployment of telecommunication services for the purpose of electronic transmission of voice, fax, or data, between distant parties. The history of telephony is i ...
networks), clustering occurs in the short term but smooths out over the long term. *With self-similar traffic, the bursty behaviour may itself be bursty, which exacerbates the clustering phenomena, and degrades network performance. Many aspects of network quality of service depend on coping with traffic peaks that might cause network failures, such as *Cell/packet loss and queue overflow *Violation of delay bounds e.g. in video *Worst cases in statistical multiplexing Poisson processes are well-behaved because they are stateless, and peak loading is not sustained, so queues do not fill. With long-range order, peaks last longer and have greater impact: the equilibrium shifts for a while.


See also

*
Long-tail traffic A long-tailed or heavy-tailed probability distribution is one that assigns relatively high probabilities to regions far from the mean or median. A more formal mathematical definition is given below. In the context of teletraffic engineering a number ...


References

{{Reflist


External links


A site offering numerous links to articles
written on the effect of self-similar traffic on network performance. Teletraffic Autocorrelation