HOME

TheInfoList



OR:

Rank–size distribution is the distribution of size by rank, in decreasing order of size. For example, if a data set consists of items of sizes 5, 100, 5, and 8, the rank-size distribution is 100, 8, 5, 5 (ranks 1 through 4). This is also known as the rank–frequency distribution, when the source data are from a
frequency distribution In statistics, the frequency (or absolute frequency) of an event i is the number n_i of times the observation has occurred/recorded in an experiment or study. These frequencies are often depicted graphically or in tabular form. Types The cumul ...
. These are particularly of interest when the data vary significantly in scales, such as city size or word frequency. These distributions frequently follow a
power law In statistics, a power law is a functional relationship between two quantities, where a relative change in one quantity results in a proportional relative change in the other quantity, independent of the initial size of those quantities: one qu ...
distribution, or less well-known ones such as a
stretched exponential function The stretched exponential function f_\beta (t) = e^ is obtained by inserting a fractional power law into the exponential function. In most applications, it is meaningful only for arguments between 0 and +∞. With , the usual exponential function ...
or parabolic fractal distribution, at least approximately for certain ranges of ranks; see below. A rank-size distribution is not a
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
or
cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ever ...
. Rather, it is a discrete form of a quantile function (inverse cumulative distribution) in reverse order, giving the size of the element at a given rank.


Simple rank–size distributions

In the case of city populations, the resulting distribution in a country, a region, or the world will be characterized by its largest city, with other cities decreasing in size respective to it, initially at a rapid rate and then more slowly. This results in a few large cities and a much larger number of cities orders of magnitude smaller. For example, a rank 3 city would have one-third the population of a country's largest city, a rank 4 city would have one-fourth the population of the largest city, and so on. When any log-linear factor is ranked, the ranks follow the
Lucas numbers The Lucas numbers or Lucas series are an integer sequence named after the mathematician François Édouard Anatole Lucas (1842–1891), who studied both that sequence and the closely related Fibonacci numbers. Lucas numbers and Fibonacci nu ...
, which consist of the sequentially additive numbers 1, 3, 4, 7, 11, 18, 29, 47, 76, 123, 199, etc. Like the more famous
Fibonacci sequence In mathematics, the Fibonacci numbers, commonly denoted , form a sequence, the Fibonacci sequence, in which each number is the sum of the two preceding ones. The sequence commonly starts from 0 and 1, although some authors start the sequence from ...
, each number is approximately 1.618 (the
Golden ratio In mathematics, two quantities are in the golden ratio if their ratio is the same as the ratio of their sum to the larger of the two quantities. Expressed algebraically, for quantities a and b with a > b > 0, where the Greek letter phi ( ...
) times the preceding number. For example, the third term in the sequence above, 4, is approximately 1.6183, or 4.236; the fourth term, 7, is approximately 1.6184, or 6.854; the eighth term, 47, is approximately 1.6188, or 46.979. With higher values, the figures converge. An equiangular spiral is sometimes used to visualize such sequences.


Segmentation

A rank-size (or rank–frequency) distribution is often segmented into ranges. This is frequently done somewhat arbitrarily or due to external factors, particularly for
market segmentation In marketing, market segmentation is the process of dividing a broad consumer or business market, normally consisting of existing and potential customers, into sub-groups of consumers (known as ''segments'') based on some type of shared charac ...
, but can also be due to distinct behavior as rank varies. Most simply and commonly, a distribution may be split in two pieces, termed the head and tail. If a distribution is broken into three pieces, the third (middle) piece has several terms, generically middle,Illustrating the Long Tail
Rand Fishkin, November 24th, 2009
also belly,Digg that Fat Belly!
Robert Young, Sep. 4, 2006
torso, and body.The Small Head, the Medium Body, and the Long Tail .. so, where's Microsoft?
, 12 Mar 2005, Lawrence Liu's Report from the Inside
These frequently have some adjectives added, most significantly ''
long tail In statistics and business, a long tail of some distributions of numbers is the portion of the distribution having many occurrences far from the "head" or central part of the distribution. The distribution could involve popularities, random nu ...
'', also ''fat belly'', ''chunky middle'', etc. In more traditional terms, these may be called ''top-tier'', ''mid-tier'', and ''bottom-tier''. The relative sizes and weights of these segments (how many ranks in each segment, and what proportion of the total population is in a given segment) qualitatively characterize a distribution, analogously to the
skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal ...
or
kurtosis In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurtos ...
of a probability distribution. Namely: is it dominated by a few top members (head-heavy, like profits in the recorded music industry), or is it dominated by many small members (tail-heavy, like internet search queries), or distributed in some other way? Practically, this determines strategy: where should attention be focused? These distinctions may be made for various reasons. For example, they may arise from differing properties of the population, as in the 90–9–1 principle, which posits that in an internet community, 90% of the participants of a community only view content, 9% of the participants edit content, and 1% of the participants actively create new content. As another example, in marketing, one may pragmatically consider the head as all members that receive personalized attention, such as personal phone calls; while the tail is everything else, which does not receive personalized attention, for example receiving
form letter A form letter is a letter written from a template, rather than being specially composed for a specific recipient. The most general kind of form letter consists of one or more regions of boilerplate text interspersed with one or more substitution pl ...
s; and the line is simply set at a point that resources allow, or where it makes business sense to stop. Purely quantitatively, a conventional way of splitting a distribution into head and tail is to consider the head to be the first ''p'' portion of ranks, which account for 1 - p of the overall population, as in the 80:20
Pareto principle The Pareto principle states that for many outcomes, roughly 80% of consequences come from 20% of causes (the "vital few"). Other names for this principle are the 80/20 rule, the law of the vital few, or the principle of factor sparsity. Manage ...
, where the top 20% (head) comprises 80% of the overall population. The exact cutoff depends on the distribution – each distribution has a single such cutoff point—and for power, laws can be computed from the Pareto index. Segments may arise naturally due to actual changes in the behavior of the distribution as rank varies. Most common is the
king effect In statistics, economics, and econophysics, the king effect is the phenomenon in which the top one or two members of a ranked set show up as clear outliers. These top one or two members are unexpectedly large because they do not conform to the s ...
, where the behavior of the top handful of items does not fit the pattern of the rest, as illustrated at the top for country populations, and above for most common words in English Wikipedia. For higher ranks, behavior may change at some point, and be well-modeled by different relations in different regions; on the whole by a piecewise function. For example, if two different power laws fit better in different regions, one can use a
broken power law In statistics, a power law is a functional relationship between two quantities, where a relative change in one quantity results in a proportional relative change in the other quantity, independent of the initial size of those quantities: one qu ...
for the overall relation; the word frequency in English Wikipedia (above) also demonstrates this. The
Yule–Simon distribution In probability and statistics, the Yule–Simon distribution is a discrete probability distribution named after Udny Yule and Herbert A. Simon. Simon originally called it the ''Yule distribution''. The probability mass function (pmf) of the Yule ...
that results from
preferential attachment A preferential attachment process is any of a class of processes in which some quantity, typically some form of wealth or credit, is distributed among a number of individuals or objects according to how much they already have, so that those who ...
(intuitively, "the rich get richer" and "success breeds success") simulates a broken power law and has been shown to "very well capture" word frequency versus rank distributions. It originated from trying to explain the population versus rank in different species. It has also been shown to fit city population versus rank better.


Rank–size rule

The rank-size rule (or law) describes the remarkable regularity in many phenomena, including the distribution of city sizes, the sizes of businesses, the sizes of particles (such as sand), the lengths of rivers, the frequencies of word usage, and wealth among individuals. All are real-world observations that follow
power law In statistics, a power law is a functional relationship between two quantities, where a relative change in one quantity results in a proportional relative change in the other quantity, independent of the initial size of those quantities: one qu ...
s, such as Zipf's law, the Yule distribution, or the
Pareto distribution The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto ( ), is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actuarial ...
. If one ranks the population size of cities in a given country or in the entire world and calculates the
natural logarithm The natural logarithm of a number is its logarithm to the base of the mathematical constant , which is an irrational and transcendental number approximately equal to . The natural logarithm of is generally written as , , or sometimes, if ...
of the rank and of the city population, the resulting graph will show a log-linear pattern. This is the rank-size distribution.


Theoretical rationale

One study claims that the rank-size rule "works" because it is a "shadow" or coincidental measure of the true phenomenon. The true value of rank-size is thus not an accurate mathematical measure (since other power-law formulas are more accurate, especially at ranks lower than 10) but rather as a handy measure or "rule of thumb" to spot power laws. When presented with a ranking of data, is the third-ranked variable approximately one-third the value of the highest-ranked one? Or, conversely, is the highest-ranked variable approximately ten times the value of the tenth-ranked one? If so, the rank-size rule has possibly helped spot another power-law relationship.


Known exceptions to simple rank–size distributions

While Zipf's law works well in many cases, it tends to not fit the largest cities in many countries; one type of deviation is known as the
King effect In statistics, economics, and econophysics, the king effect is the phenomenon in which the top one or two members of a ranked set show up as clear outliers. These top one or two members are unexpectedly large because they do not conform to the s ...
. A 2002 study found that Zipf's law was rejected in 53 of 73 countries, far more than would be expected based on random chance. The study also found that variations of the Pareto exponent are better explained by political variables than by economic geography variables like proxies for
economies of scale In microeconomics, economies of scale are the cost advantages that enterprises obtain due to their scale of operation, and are typically measured by the amount of output produced per unit of time. A decrease in cost per unit of output enables a ...
or transportation costs. A 2004 study showed that Zipf's law did not work well for the five largest cities in six countries.Cuberes, David, The Rise and Decline of Cities, University of Chicago, September 29, 2004, In the richer countries, the distribution was flatter than predicted. For instance, in the
United States The United States of America (U.S.A. or USA), commonly known as the United States (U.S. or US) or America, is a country primarily located in North America. It consists of 50 states, a federal district, five major unincorporated territo ...
, although its largest city,
New York City New York, often called New York City or NYC, is the most populous city in the United States. With a 2020 population of 8,804,190 distributed over , New York City is also the most densely populated major city in the Un ...
, has more than twice the population of second-place
Los Angeles Los Angeles ( ; es, Los Ángeles, link=no , ), often referred to by its initials L.A., is the largest city in the state of California and the second most populous city in the United States after New York City, as well as one of the world' ...
, the two cities' metropolitan areas (also the two largest in the country) are much closer in population. In metropolitan-area population, New York City is only 1.3 times larger than Los Angeles. In other countries, the largest city would dominate much more than expected. For instance, in the
Democratic Republic of the Congo The Democratic Republic of the Congo (french: République démocratique du Congo (RDC), colloquially "La RDC" ), informally Congo-Kinshasa, DR Congo, the DRC, the DROC, or the Congo, and formerly and also colloquially Zaire, is a country in ...
, the capital,
Kinshasa Kinshasa (; ; ln, Kinsásá), formerly Léopoldville ( nl, Leopoldstad), is the capital and largest city of the Democratic Republic of the Congo. Once a site of fishing and trading villages situated along the Congo River, Kinshasa is now one of ...
, is more than eight times larger than the second-largest city,
Lubumbashi Lubumbashi (former names: ( French), (Dutch)) is the second-largest city in the Democratic Republic of the Congo, located in the country's southeasternmost part, along the border with Zambia. The capital and principal city of the Haut-Katanga ...
. When considering the entire distribution of cities, including the smallest ones, the rank-size rule does not hold. Instead, the distribution is
log-normal In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable is log-normally distributed, then has a normal ...
. This follows from
Gibrat's law Gibrat's law, sometimes called Gibrat's rule of proportionate growth or the law of proportionate effect, is a rule defined by Robert Gibrat (1904–1980) in 1931 stating that the proportional rate of growth of a firm is independent of its absolut ...
of proportionate growth. Because exceptions are so easy to find, the function of the rule for analyzing cities today is to compare the city systems in different countries. The rank-size rule is a common standard by which urban primacy is established. A distribution such as that in the United States or China does not exhibit a pattern of primacy, but countries with a dominant "
primate city A primate city is a city that is the largest in its country, province, state, or region, and disproportionately larger than any others in the urban hierarchy. A ''primate city distribution'' is a rank-size distribution that has one very large ci ...
" clearly vary from the rank-size rule in the opposite manner. Therefore, the rule helps to classify national (or regional) city systems according to the degree of dominance exhibited by the largest city. Countries with a primate city, for example, have typically had a colonial history that accounts for that city pattern. If a normal city distribution pattern is expected to follow the rank-size rule (i.e. if the rank-size principle correlates with central place theory), then it suggests that those countries or regions with distributions that do not follow the rule have experienced some conditions that have altered the normal distribution pattern. For example, the presence of multiple regions within large nations such as China and the United States tends to favor a pattern in which more large cities appear than would be predicted by the rule. By contrast, small countries that had been connected (e.g. colonially/economically) to much larger areas will exhibit a distribution in which the largest city is much larger than would fit the rule, compared with the other cities—the excessive size of the city theoretically stems from its connection with a larger system rather than the natural hierarchy that central place theory would predict within that one country or region alone.


See also

*
Pareto principle The Pareto principle states that for many outcomes, roughly 80% of consequences come from 20% of causes (the "vital few"). Other names for this principle are the 80/20 rule, the law of the vital few, or the principle of factor sparsity. Manage ...
*
Long tail In statistics and business, a long tail of some distributions of numbers is the portion of the distribution having many occurrences far from the "head" or central part of the distribution. The distribution could involve popularities, random nu ...


References


Further reading

* * * *
Douglas R. White Douglas R. White (1942 – 22 August 2021) was an American complexity researcher, social anthropologist, sociologist, and social network researcher at the University of California, Irvine. Biography Douglas White was born in Minneapolis in 194 ...
, Laurent Tambayong, and Nataša Kejžar. 2008. Oscillatory dynamics of city-size distributions in world-historical systems. ''Globalization as an Evolutionary Process: Modeling Global Change''. Ed. by George Modelski, Tessaleno Devezas, and William R. Thompson. London: Routledge.
The Use of Agent-Based Models in Regional Science
��an agent-based simulation study that explains rank–size distribution.


External links

* {{DEFAULTSORT:Rank-size distribution Functions related to probability distributions Geography Statistical laws