Descriptive statistics, inferential statistics, rhetorical statistics

Wed Aug 6 03:23:12 EDT 2003

Rejoinder to Van den Besselaar's Letter entitled "Descriptive statistics,
inferential statistics, rhetorical statistics."

Van den Besselaar (2003) illustrates his argument with a quote of our
conclusion that "the network of words does not significantly correlate with
the geographical division" (Leydesdorff & Heimeriks, 2001, p. 1266).
However, this conclusion was entirely based on replicating the simulations
suggested by Van den Besselaar in previous exchanges, i.e., on random
samplings. Van den Besselaar & Heimeriks (2000, pp. 89-93) was for that
reason provided as a reference. Indeed, the inference cannot be based on the
descriptive statistics.

Loet Leydesdorff

----- Original Message -----
From: "Peter van.den.Besselaar" <Peter.van.den.Besselaar at NIWI.KNAW.NL>
To: <SIGMETRICS at LISTSERV.UTK.EDU>
Sent: Wednesday, August 06, 2003 8:54 AM
Subject: [SIGMETRICS] Descriptive statistics, inferential statistics,
rhetorical statistics

> In a contribution to this list, Loet Leydesdorff replied to my brief
communication in JASIST (2003-1) "Empirical evidence for
self-organization?". My reply - as letter to the editor - is now published
in JASIST 2003-9:
>
>
> "Descriptive statistics, inferential statistics, rhetorical statistics"
>
> Loet Leydesdorff (2003) argues that my analysis (Van den Besselaar, 2003)
is not correct and not relevant. In his argument, however, he mixes up
samples and populations, and he incorrectly uses concepts such as
'significance' and 'eigenstructures'.
>
> Leydesdorff's data are attributes of the papers in "a carefully selected
set" of biotechnology journals. In other words, it is not a sample from a
larger set of journals, and therefore he analyzes on the level of the
population. Applying statistical techniques on a population is descriptive
statistics. Of course statistical packages like SPSS calculate 'significance
levels' but these belong to the realm of inferential statistics, that is
generalizing from random samples to populations. In his claim that my
"simulation results usually did not pass the significance tests provided by
SPSS" and that  his "results using bibliometric data did pass these tests",
he is confusing samples and populations. As there is no sample whatsoever,
using the qualification 'significant' is irrelevant and misplaced. The same
holds when he uses the results of the simulations to conclude that "the
network of words does also not significantly correlate with the geographical
division."
>
> Samples come into play when testing the quality of the discriminant
analysis. I have drawn random samples, and use the sample statistics (the
discriminant functions) to predict the population parameters. As every
random sample fails to do this, one has to conclude that using discriminant
analysis for describing the relation between 'title words' and 'region of
origin' is wrong. This can be explained by the large number of unique
observations in the data, and this also explains the results of the
simulations (Van den Besselaar, Heimeriks 1998, pp 98-100). Leydesdorff
argues that this test of the DA is not relevant because "one cannot expect
any significant correlation between the eigenstructures of highly specific
samples." Of course one does not expect this in case of highly specific
samples, but my test shows that the eigenstructures of random samples are
completely different.
>
> Leydesdorff states that I misread and selectively quote his paper, as he
is not doing first order data analysis. He tries to develop a 'new
methodology for second order theorizing' to answer 'what-if questions' about
the interaction between the global knowledge production system and regional
institutionalization. I do not have problems with type of questions, but the
'new methodology' needs clarification: what can we conclude from the
'significant' correlations between the 'regional word sets' with the word
sets representing the 'intellectual space' (Leydesdorff & Heimeriks 2001,
p.1268)? First, the mapping of the intellectual space is based on a very
weak factor structure (Leydesdorff & Heimeriks 2001, 1266). Second, the
regional word sets are highly questionable (Van den Besselaar 2003).
Additionally, I showed that the positions of the three regions within the
intellectual space change from year to year (E-mail communication, November
1998). This change is so implausible t!
>  hat one should seriously doubt about the adequacy of the methods used to
measure these positions: the discriminant analysis. The conclusion is that,
despite the 'significant' results, the 'new methodology' is not convincing.
What remains is an example of rhetorical statistics.