Plato's Cave

Stephen J Bensman notsjb at LSU.EDU
Thu Dec 16 15:52:12 EST 2010


I like writing this provocative stuff, and it is good that I do.  I
certainly do not make a lot of money doing this and could be more
gainfully employed.

 

Stephen J. Bensman

LSU Libraries

Louisiana State University

Baton Rouge, LA   70803

USA

notsjb at lsu.edu

 

Fisher's revolutionary reconceptualization of statistics derived from
the necessity imposed upon him by his position as statistician at the
Rothamsted Experiment Station to develop methods applicable to practical
research in agriculture.  According to his daughter, J. F. Box (1987, p.
51), it was after he assumed this post that he really, as he put it,
found his feet in research.  Much of the research at Rothamsted involved
the analysis of small samples.  According to Yeats and Mather (1963, p.
98), a major weakness of the Pearsonian school was their failure to
consider the need of experimenters for methods appropriate to small
samples involving quantitative observations.  As a matter of fact,
according to Egon Pearson (1939, p. 225), his father considered all
small sample work dangerous and something to be avoided.  Fisher (1925a)
set out to rectify this fault with his textbook Statistical Methods for
Research Workers, whose first edition's author's preface reads as
follows:

   For several years the author has been working in somewhat intimate

   co-operation with a number of biological research departments; the 

   present book is in every sense the product of this circumstance.
Daily

   contact with the statistical problems which present themselves to the


   laboratory worker has stimulated the purely mathematical researches 

   upon which are based the methods here presented. Little experience 

   is sufficient to show that the traditional machinery of statistical
processes

   is wholly unsuited to the needs of practical research. Not only does
it 

   take a cannon to shoot a sparrow, but it misses the sparrow! The
elaborate

   mechanism built on the theory of infinitely large samples is not
accurate

   enough for simple laboratory data. Only by systematically tackling
small

   sample problems on their merits does it seem possible to apply
accurate

   tests to practical data. Such at least has been the aim of this book.
p. vii. 

 

This deliberate abandonment of the law of large numbers caused Fisher to
elucidate clearly the philosophical principles on which statistics are
based as well as the informational nature of quantitative data.

            Fisher developed his reconceptualization of statistics in
two key papers entitled "On the Mathematical Foundations of Theoretical
Statistics" (1922a) and "Theory of Statistical Estimation" (1925b).  The
first was described by Fisher (1950) as "the first large-scale attack on
the problem of estimation" (p. 10.308a), and here Fisher (1922a) stated
that the theoretical bases of statistical methods were being obscured by
imprecision in defining concepts.  He located the main culprit in this
matter in the following:

              ...it has happened that in statistics a purely verbal
confusion has

   hindered the distinct formulation of statistical problems; for it is 

   customary to apply the same name, mean, standard deviation, 

   correlation coefficient, etc., both to the true value which we should

   like to know, but can only estimate, and to the particular value

   at which we happen to arrive by our methods of estimation; so 

   also in applying the term probable error, writers sometimes would 

  appear to suggest that the former quantity, and not merely the latter,


   is subject to error.  p. 311.

Fisher set out to rectify this fault in this paper and the following
1925 paper, summarizing in a succinct, simplified manner his main
conclusions in the introductory pages of his textbook Statistical
Methods for Research Workers (1925a).  The discussion here will be based
primarily on the simplified presentation in the textbook. 

            In the introductory pages of his textbook Fisher (1925a)
defined statistics as the study of populations, variation, and the
reduction of data.  By populations he meant just not people but
measurements, stating:

   ...If an observation, such as a simple measurement, be repeated a
number 

   of times, the aggregate of the results is a population of
measurements.... Just

   as a single observation may be regarded as an individual, and its
repetition as 

   generating a population, so the entire result of an extensive
experiment 

   may be regarded as but one of a population of such experiments. The
salutary 

   habit of repeating important experiments, or of carrying out original
observations 

   in replicate, shows a tacit appreciation of the fact that the object
of our study is 

   not the individual result, but the population of possibilities of
which we do 

   our best to make our experiments representative. The calculation of
means and 

   probable errors shows a deliberate attempt to find out something
about 

   that population.   p. 3. 

As for the study of variation, Fisher stated that populations subject to
statistical analysis always display variation in one or more aspects,
and it is the study of variation that distinguished modern statistics
from that preceding it.  Thus, he wrote:

   ...until comparatively recent times, the vast majority of workers

    in this field appear to have had no other aim than to ascertain
aggregate, 

    or average, values.  The variation itself was not an object of
study, but 

   was recognised rather as a troublesome circumstance which detracted 

   from the value of the average. The error curve of the mean of a
normal

   sample has been familiar for a century, but that of the standard
deviation

   has scarcely been securely established for a decade.  pp. 3-4. 

He then noted that the study of variation leads immediately to the
concept of a frequency distribution.

            It is when Fisher embarked upon the explication of the
methods for the reduction of data that Bartlett (1978, p. 354) credits
him with prefiguring modern information theory.  According to Fisher
(1925a), any data set contains not only relevant but irrelevant
information, and it is desirable to express all the relevant information
by means of a comparatively few numerical values.  He then defined the
purpose of the statistical processes involved in data reduction as "to
exclude this irrelevant information, and to isolate the whole of the
relevant information contained in the data" (p. 7).  At this point
Fisher introduced the linguistic distinction he pointed out in his 1922
paper as necessary for statistical theory to advance.  He began by
stating, "Even in the simplest cases the values (or sets of values)
before us are interpreted as a random sample of a hypothetical infinite
population of such values as might have arisen in the same
circumstances" (p. 7).  In response to the query of a referee, Fisher
(1925b) succinctly defined the "hypothetical population" as "the
conceptual resultant of the conditions we are studying" (p. 700),
thereby linking the population to the hypothesis being tested.  In the
textbook Fisher defined the term "parameters" as the constants of the
mathematical formula specifying the distribution of the "hypothetical
infinite population" and "statistics" as estimators of these parameters
calculated off the observations.  He thereby made a clear distinction
between the estimator and that being estimated.  Fisher then defined the
qualifications of satisfactory statistics on the basis of their behavior
in large samples.  These statistics had to be "consistent" in that they
tend more and more nearly to give correct values of the parameter as the
sample becomes larger.  They should be all "efficient" in that their
error distributions should tend to the normal distribution as their
sample size increases with the least possible variance.  And, then most
importantly of all, the statistics should be "sufficient," whose
characteristics were described by Fisher (1925a) in the following
passage:

   ...There is, however, one class of statistics, including some of 

   the most frequently recurring examples, which is of theoretical 

   interest for possessing the remarkable property that, even in small 

   samples, a statistic of this class alone includes the whole of the

    relevant information which the observations contain. Such statistics

   are distinguished by the term sufficient, and, in the use of small 

   samples, sufficient statistics, when they exist, are definitely
superior 

   to other efficient statistics. Examples of sufficient statistics are 

   the arithmetic mean of samples from the normal distribution, or 

   from the Poisson Series; it is the fact of providing sufficient
statistics 

   for these two important types of distribution which gives to the
arithmetic 

   mean its theoretical importance.... p. 15.   

Thus, Fisher laid the theoretical bases for small sample work with
concepts of infinite hypothetical population, parameters, and sufficient
statistics.

            These same concepts caused Hogben (195-) to accuse Fisher of
the same crime that he accused Quetelet of-Platonism.   Hogben (195-)
considered the concept of "a random sample of a hypothetical infinite
population" as "the kingpin of the theory of statistical inference
expounded by R. A. Fisher" (p. 98).  For him it was emblematic of the
Platonic underpinnings of the inferential statistics developed by the
British biometric school.  Linking Fisher's hypothetical universe with
Quetelet's average man and the normal paradigm, Hogben identified "the
angelic choir in the Platonic empyrean of universals with an infinite
population of the Normal Man" (p. 180), and he denounced as "Platonic
constructs" the concepts of "the infinite hypothetical population, the
normal man and the normal environment" (p. 476).  There is truth in
Hogben's charge, for in Fisher's world we do appear to be prisoners in
Plato's cave, trying to divine the nature of our infinite conceptual or
hypothetical populations from the shadows (statistics) cast upon the
wall by the Ideas or Forms (parameters) of these populations.  It is
also ironical, as Hogben (195-, p. 98) pointed out, that Fisher based
himself on the same philosophical principles as did his archenemy, Karl
Pearson, who in his Grammar based science and statistics on the precepts
of Bishop Berkeley and Immanuel Kant much to the chagrin of the ultimate
materialist, Vladimir Lenin, then imposing his Bolshevik values on
Russia, one of the leaders in the development of Continental statistics,
and making that country statistically moribund.      

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20101216/9dcdb41d/attachment.html>


More information about the SIGMETRICS mailing list