In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (or

Lebesgue measure In measure theory, a branch of mathematics, the Lebesgue measure, named after French mathematician Henri Lebesgue, is the standard way of assigning a measure to subsets of ''n''-dimensional Euclidean space. For ''n'' = 1, 2, or 3, it coincides wit ...

1). In other words, the set of possible exceptions may be non-empty, but it has probability 0. The concept is analogous to the concept of " almost everywhere" in

measure theory In mathematics, the concept of a measure is a generalization and formalization of geometrical measures ( length, area, volume) and other common notions, such as mass and probability of events. These seemingly distinct concepts have many simil ...

. In probability experiments on a finite sample space, there is no difference between ''almost surely'' and ''surely'' (since having a probability of 1 often entails including all the sample points). However, this distinction becomes important when the sample space is an infinite set, because an infinite set can have non-empty subsets of probability 0. Some examples of the use of this concept include the strong and uniform versions of the

law of large numbers In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials shou ...

, and the continuity of the paths of Brownian motion. The terms almost certainly (a.c.) and almost always (a.a.) are also used. Almost never describes the opposite of ''almost surely'': an event that happens with probability zero happens ''almost never''.

Formal definition

Let

(\Omega,\mathcal,P)

be a probability space. An event

E \in \mathcal

happens ''almost surely'' if

P(E)=1

. Equivalently,

E

happens almost surely if the probability of

E

not occurring is zero:

P(E^C) = 0

. More generally, any event

E \subseteq \Omega

(not necessarily in

\mathcal

) happens almost surely if

E^C

is contained in a

null set In mathematical analysis, a null set N \subset \mathbb is a measurable set that has measure zero. This can be characterized as a set that can be covered by a countable union of intervals of arbitrarily small total length. The notion of null s ...

: a subset

N

\mathcal F

such that The notion of almost sureness depends on the probability measure

P

. If it is necessary to emphasize this dependence, it is customary to say that the event

E

occurs ''P''-almost surely, or almost surely ''

\left(\!P\right)

''.

Illustrative examples

In general, an event can happen "almost surely", even if the probability space in question includes outcomes which do not belong to the event—as the following examples illustrate.

Throwing a dart

Imagine throwing a dart at a unit square (a square with an area of 1) so that the dart always hits an exact point in the square, in such a way that each point in the square is equally likely to be hit. Since the square has area 1, the probability that the dart will hit any particular subregion of the square is equal to the area of that subregion. For example, the probability that the dart will hit the right half of the square is 0.5, since the right half has area 0.5. Next, consider the event that the dart hits exactly a point in the diagonals of the unit square. Since the area of the diagonals of the square is 0, the probability that the dart will land exactly on a diagonal is 0. That is, the dart will ''almost never'' land on a diagonal (equivalently, it will ''almost surely'' not land on a diagonal), even though the set of points on the diagonals is not empty, and a point on a diagonal is no less possible than any other point.

Tossing a coin repeatedly

Consider the case where a (possibly biased) coin is tossed, corresponding to the probability space

(\, 2^, P)

, where the event

\

occurs if a head is flipped, and

\

if a tail is flipped. For this particular coin, it is assumed that the probability of flipping a head is

P(H) = p\in (0,1)

, from which it follows that the complement event, that of flipping a tail, has probability

P(T) = 1 - p

. Now, suppose an experiment were conducted where the coin is tossed repeatedly, with outcomes

\omega_1,\omega_2,\ldots

and the assumption that each flip's outcome is independent of all the others (i.e., they are independent and identically distributed; ''i.i.d''). Define the sequence of random variables on the coin toss space,

(X_i)_

where

X_i(\omega)=\omega_i

. ''i.e.'' each

X_i

records the outcome of the

i

th flip. In this case, any infinite sequence of heads and tails is a possible outcome of the experiment. However, any particular infinite sequence of heads and tails has probability 0 of being the exact outcome of the (infinite) experiment. This is because the ''i.i.d.'' assumption implies that the probability of flipping all heads over

n

flips is simply

P(X_i = H, \ i=1,2,\dots,n)=\left(P(X_1 = H)\right)^n = p^n

. Letting

n\rightarrow\infty

yields 0, since

p\in (0,1)

by assumption. The result is the same no matter how much we bias the coin towards heads, so long as we constrain

p

to be strictly between 0 and 1. In fact, the same result even holds in non-standard analysis—where infinitesimal probabilities are not allowed. Moreover, the event "the sequence of tosses contains at least one

T

" will also happen almost surely (i.e., with probability 1). But if instead of an infinite number of flips, flipping stops after some finite time, say 1,000,000 flips, then the probability of getting an all-heads sequence,

p^

, would no longer be 0, while the probability of getting at least one tails,

1 - p^

, would no longer be 1 (i.e., the event is no longer almost sure).

Asymptotically almost surely

asymptotic analysis In mathematical analysis, asymptotic analysis, also known as asymptotics, is a method of describing limiting behavior. As an illustration, suppose that we are interested in the properties of a function as becomes very large. If , then as beco ...

, a property is said to hold ''asymptotically almost surely'' (a.a.s.) if over a sequence of sets, the probability converges to 1. For instance, in number theory, a large number is asymptotically almost surely composite, by the

prime number theorem In mathematics, the prime number theorem (PNT) describes the asymptotic distribution of the prime numbers among the positive integers. It formalizes the intuitive idea that primes become less common as they become larger by precisely quantifying ...

; and in

random graph theory In mathematics, random graph is the general term to refer to probability distributions over graphs. Random graphs may be described simply by a probability distribution, or by a random process which generates them. The theory of random graphs li ...

, the statement "

G(n,p_n)

is connected" (where

G(n,p)

denotes the graphs on

n

vertices with edge probability

p

) is true a.a.s. when, for some

\varepsilon > 0

p_n > \frac n.

In number theory, this is referred to as "

almost all In mathematics, the term "almost all" means "all but a negligible amount". More precisely, if X is a set, "almost all elements of X" means "all elements of X but those in a negligible subset of X". The meaning of "negligible" depends on the math ...

", as in "almost all numbers are composite". Similarly, in graph theory, this is sometimes referred to as "almost surely".

Notes

References

* *{{cite book , last=Williams , first=David , title=Probability with Martingales , date=1991 , series=Cambridge Mathematical Textbooks , publisher=Cambridge University Press , isbn=978-0521406055 Probability theory Mathematical terminology