A single theory can explain how power-law distributions emerge from such wildly different areas as economy, cultural geography, ecology, linguistics, sociology, and biological chemistry.
Sometimes as a physicist you find yourself in the wilderness, completely outside the traditional, well-mapped areas of physics. Despite the unfamiliar territory, you might still find your physics backpack handy—as I did when I looked into the question of fairness in physical distributions.
Most of you already know, or would guess, that the average height of Swedish men at a given age follows a normal distribution, the bell curve, as do a lot of things in everyday life (figure 1). A characteristic feature of the normal distribution is that the average is much larger than the width. In daily life we tend to think that the smaller the ratio is between the width and the average, the more just the distribution is. Of course, that tendency is a bit ridiculous. In itself, the normal distribution has nothing to do with human envy. Nevertheless, the expectation that narrow distributions are fairer has, for more than a century, confounded efforts to understand them.
Swedish men
When I wanted to understand why the height of Swedish men is normally distributed, I thought in the following wa y. Suppose that the group consists of N men and that their total height is H. The average height is then H/N. I divide H into M very small pieces and then randomly ration out those pieces to the men. That assignment corresponds to the assumption that each man has an equal chance to grow by a small piece every day. The number of small pieces, k, that a person has received when the rationing is over corresponds to his length.
Mathematically, the outcome is the same as randomly placing M balls in N boxes. The resulting distribution is the probability that a person has certain length k. The entropy of this probability distribution is given by
S = −⟨lnP(k)⟩,
where P(k) is the length distribution and ⟨ ⟩ denotes the average over the distribution.
The shape of the distribution can be predicted by evoking the condition of maximum entropy and by using variational calculus, a piece of mathematics that Joseph Lagrange and others developed about 250 years ago. Lagrange explained how to find the maxima (or minima) of functionals while incorporating a priori knowledge about their locations.
Let us assume that in the above example you know the average height ⟨k⟩ = H/N and the spread of the data around the average ⟨(k − H/N)2⟩, and you know that the resulting distribution should be a probability distribution. By maximizing the entropy and incorporating your a priori knowledge, you obtain the most probable distribution. If you also know that the spread is small—that is, ⟨(k − H/N)2⟩ ≪ ⟨k⟩2, then the result is a normal distribution in the form of a bell curve.
So if I know the average and the spread, then I can use the physicist́s maximum entropy tool to predict the shape of the distribution. And, as figure 1 shows, that is a very good prediction—much better than the weather forecast.
Personal wealth
At the end of the 19th century, economist Vilfredo Pareto discovered that wealth among a population is not normally distributed. Rather, it seemed to have the form of a power law P(k) ∼ k−γ.
figure 2a shows the distribution of wealth in 2011 among the richest people in the US in a log–log plot. The distribution is quite skewed and somewhat resembles a power law. It also illustrates that the richest man in the US, Bill Gates (rightmost data point), is about 30 times richer than the average of the next 400 fabulously rich Americans. It’s natural to view such a skewed distribution of wealth as both unethical and unjust.
The difference between the normal distribution of length and the power-law distribution of wealth is precisely the skewness: If Bill Gates were a length, he would measure more than 300 miles from head to toe. How can the occurrence of that extraordinarily broad distribution be explained?
In the case of wealth, one can seek an explanation in terms of human economic actions and greed. That approach seems fine at first. But the same broad distribution was discovered early in the 20th century to represent town populations in a country (figure 2b). In the case of towns, population growth, commerce, and migration are among the factors likely to shape the distribution. Power-law distributions appear in other diverse settings and, as is the case with wealth and town size, have seemingly disparate causes.
Species among genera (figure 2c), which should have something to do with biological evolution.
Word frequency in a novel (figure 2d), which should have something to do with how you write a book or how the human language arose and developed.
Surnames in a country (figure 2e), which should have something to do with the country’s society and history.
Substances in the reaction network formed by metabolism in a human cell (figure 2f), which should have something to do with chemistry and with the emergence and evolution of life.
Attempts to come up with explanations for those and other distributions have suggestive names like “Rich gets richer,” “The Matthew effect” (for whosoever hath to him shall be given), “The principle of least effort,” “Zipf́s law,” and “Proportional growth.’ All of the attempts are, more or less, system-specific: The underlying assumption is that the skewed distribution in each case reflects some specific property of the system in question. If you applied a similar approach to the height distribution of Swedish men, you would look for explanations in terms of the specific genes and the specific way of life of Swedish male individuals.
Herbert Simon, who won the 1978 Nobel Prize in Economics, argued in the 1950s that if so many entirely different systems share the same skewed distribution, then the explanation ought to be more general and less system-specific. His own suggestion for a general explanation became known as the Simon model and is a stochastic growth model. Although it is indeed a general explanation, it does not describe actual data particularly well.
A group approach
Some of my collaborators at IceLab in Umeå and I stumbled into this fascinating area as a spin-off from an attempt to understand complex networks. Being physicists, we focused on the entropy concept and tried to find a general description akin to the one for the normal distribution that I outlined above.
Our starting point was that the only really common feature shared by all the specific examples above is that they arise where a set of elements has formed groups. The elements can be people, words, species, or substances. The groups can be wealth, counties, surnames, genera, word frequency in a text, and number of connections in a chemical network. But instead of associating an element directly with a group (as was the case for Swedish men in a pool of recruits), we instead argued that in many cases it is more natural to associate elements with one another.
To see what we mean, consider distribution of town populations. One cause of a change in the number of inhabitants could be that person A moves to the same town as person B, regardless of which town B happens to live in. If a town has k inhabitants, then there are k different individuals who could potentially be the reason for a certain person to move to that particular town. In the language of physics, the fundamental interactions are among elements rather than among elements and the groups they belong to.
It turns out that the condition for maximal entropy in this case is analogous to what I call the postmańs nightmare. In common with his real-life counterparts, this postman delivers letters. However, in this dystopian society the address on a letter consists of just the addressee and a single number k. The postman has to find the rest of the address on his own.
The number k tells him that the addressee is to be found in one of the N(k) towns of size k. In binary, the minimum information needed to specify the town consists of a number log2N(k) digits long. In natural units, that figure corresponds to an amount of information given by lnN(k). Once the postman knows the town, he also needs the additional information lnk to find which one of the k town addresses the addressee lives at. To deliver the letter, he must therefore obtain a total of lnN(k) + lnk of information. And for each letter on average, the postman by himself has to supply the additional information I = ⟨lnN(k) + lnk⟩. The best he can hope for is that the town sizes are such that this information is very small.
It turns out that the town size distribution that gives the smallest headache to the postman is the maximum entropy distribution. Thus the optimization problem corresponds to minimizing I subject to relevant constraints. Two of the constraints we know explicitly: the number of k elements M and the number of groups N. However, for each specific problem there might be some additional constraints that we have no knowledge about. Nevertheless, we know that if such constraints are important, they will also constrain the entropy—which means you can take the unknown constraints into account through the change in the entropy that they cause, even if you do not know what they are!
Usually, additional imposed constraints lower entropy, but not always. Consider our postman. In a less nightmarish dream, the understanding inhabitants of each town accept that all he is expected to do is deliver the incompletely addressed mail from the sorting center to their town’s post office. The post office constraint would correspond to an increase in entropy.
Random group formation
From the variational calculus point of view, the task is to minimize I subject to the constraint of an a priori known entropy. Of course, you do not really know the entropy a priori, but you do know that it must have some definite value. And you can obtain that value self-consistently by requiring that the solution of the optimization predict the correct value of the number kmax of elements in the largest group. In that way, one obtains a prediction for the probability distribution P(k) based on the a priori knowledge of just three numbers: M, N, and kmax. The resulting prediction is what my colleagues and I call the distribution of the random group formation (RGF).
As in the case of the normal distribution, RGF gives a prediction that can be directly tested against data. The dotted curves in figure 2 are all RGF predictions. For all six of the systems, the RGF predictions are amazingly good. Such an agreement cheers me up!
A single theory describes data from such wildly different areas as economy, cultural geography, ecology, linguistics, sociology, and biological chemistry. How is it possible? In each case a system-specific causal chain leads to the resulting distribution. But in forming that distribution, those characteristic causes wipe themselves out, barely leaving a trace. Imagine finding a well-shuffled pack of cards. By studying the order of the cards, you can’t deduce how it was shuffled. A professional poker player could have deliberately shuffled the cards. But equally likely is that they were unintentionally shuffled by a bored person who spent a whole evening building card houses.
Petter Minnhagen is an emeritus professor of physics at Umeå University in Umeå, Sweden.
An ultracold atomic gas can sync into a single quantum state. Researchers uncovered a speed limit for the process that has implications for quantum computing and the evolution of the early universe.