loet at LEYDESDORFF.NET
Thu Feb 20 02:07:40 EST 2003
About a month ago, Eugene drew attention to a debate between Peter van
den Besselaar and me in JASIST entitled "Empirical Evidence of
Self-Organization." In an off-line conversation Peter and I agreed that
using chi-square as a non-parametric measure can perhaps provide a means
to evaluate the effects of the discriminant analysis.
One can compare the matrices for the EU/USA/Japan (N = 3) versus the
(245) title words on the basis of the geographical addresses and/or on
the basis of the classification of the cases by discriminant analysis,
respectively. Similarly, one can compare the matrices of 14 EU nations
versus these title words with the results based on classification of
these records in 14 groups. (Fourteen because there were no records
included with an address in Luxembourg.)
The results are as follows:
1. The matrix of geographical addresses in the EU/USA/Japan (N = 3)
versus title words. In this case the zero-hypothesis is not rejected
when using the strongest test of chi-square with so-called Yates
correction (p <= 0.30). This means that the word distributions are not
significantly different among these three groups.
2. The matrix of the discrimant classifications in three groups
versus title words. In this case the zero-hypothesis is rejected: p <=
0.00 (same test).
3. In the case of the EU-matrices I had to use the log-likelihood
chi-square (G2) because the matrix is sparse (N = 14). In both cases
(geographical addresses and results of the discriminant analysis), the
zero-hypothesis cannot be rejected (p <= 1.00 and p <= 0.25,
4. In all cases and using all test (Pearson chi-square, Yates
correction, and log-likelihood G2), the summation of the chi-square is
considerably higher using the discriminant classification when compared
to using the geographical addresses (400 to 500 points added to the
summation). Thus, the discriminant analysis improves on the distinction
among the groupings (as expected).
What does this mean?
1. The discriminant analysis considerably improves on the
distinction in the case of N= 3 (EU/US/Japan) to such an extent that the
grouping passes the threshold of a severe significance test (chi-square
with Yates correction).
2. The geographical addresses are not a sufficient basis for
distinguishing between the records in terms of word-occurrences. There
is also interaction among the repertoires, that is, an interactive
(next-order) repertoire. Remember that this interaction between local
and global repertoires was our initial research question
3. At the European level, there is interaction among the
repertoires to such an extent that it was no longer possible to sort the
records apart using discriminant analysis so that the results pass the
log-likelihood chi-square test. (Remember that the discriminant
functions were significant in all cases. The latter operate on
individual records, while the chi-square is tested between the
groupings. The within-group variation is then no longer used.)
4. In my opinion, these results further legitimate our decision in
the article to discard the records initially flagged by the discriminant
analysis as misplaced from the second-order analysis.
Note that these results do not affect the simulation results. One cannot
expect that a randomized attribution of the cases leads to groupings
that pass this significance test.
With kind regards,
Amsterdam School of Communications Research (ASCoR)
Kloveniersburgwal 48, 1012 CX Amsterdam
Tel.: +31-20- 525 6598; fax: +31-20- 525 3681
<mailto:loet at leydesdorff.net> loet at leydesdorff.net ;
<http://www.upublish.com/books/leydesdorff-sci.htm> The Challenge of
Scientometrics ; <http://www.upublish.com/books/leydesdorff.htm> The
Self-Organization of the Knowledge-Based Society
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SIGMETRICS