Baek, SK; Bernhardsson, S; Minnhagen, P. 2011. Zipf's law unzipped. NEW JOURNAL OF PHYSICS 13: art. no.-043004.

Eugene Garfield garfield at CODEX.CIS.UPENN.EDU
Fri Jun 10 18:24:35 EDT 2011


Baek, SK; Bernhardsson, S; Minnhagen, P. 2011. Zipf's law unzipped. NEW 
JOURNAL OF PHYSICS 13: art. no.-043004.

Author Full Name(s): Baek, Seung Ki; Bernhardsson, Sebastian; Minnhagen, 
Petter
Language: English
Document Type: Article
KeyWords Plus: DISTRIBUTIONS

Abstract: Why does Zipf's law give a good description of data from seemingly 
completely unrelated phenomena? Here it is argued that the reason is that they 
can all be described as outcomes of a ubiquitous random group division: the 
elements can be citizens of a country and the groups family names, or the 
elements can be all the words making up a novel and the groups the unique 
words, or the elements could be inhabitants and the groups the cities in a 
country and so on. A random group formation (RGF) is presented from which a 
Bayesian estimate is obtained based on minimal information: it provides the 
best prediction for the number of groups with k elements, given the total 
number of elements, groups and the number of elements in the largest group. 
For each specification of these three values, the RGF predicts a unique group 
distribution N(k) proportional to exp(-bk)/k(gamma), where the power-law index 
gamma is a unique function of the same three values. The universality of the 
result is made possible by the fact that no system-specific assumptions are 
made about the mechanism responsible for the group division. The direct 
relation between gamma and the total number of elements, groups and the 
number of elements in the largest group is calculated. The predictive power of 
the RGF model is demonstrated by direct comparison with data from a variety 
of systems. It is shown that gamma usually takes values in the interval 1 <= 
gamma <= 2 and that the value for a given phenomenon depends in a 
systematic way on the total size of the dataset. The results are put in the 
context of earlier discussions on Zipf's and Gibrat's laws, N(k) proportional to 
k(-2) and the connection between growth models and RGF is elucidated.

Addresses: [Baek, Seung Ki; Minnhagen, Petter] Umea Univ, Dept Phys, 
Integrated Sci Lab, S-90187 Umea, Sweden; [Bernhardsson, Sebastian] Niels 
Bohr Inst, Ctr Models Life, DK-2100 Copenhagen O, Denmark
Reprint Address: Minnhagen, P, Umea Univ, Dept Phys, Integrated Sci Lab, S-
90187 Umea, Sweden.

E-mail Address: Petter.Minnhagen at physics.umu.se
ISSN: 1367-2630
DOI: 10.1088/1367-2630/13/4/043004
URL: http://iopscience.iop.org/1367-2630/13/4/043004



More information about the SIGMETRICS mailing list