software routine for the triple-helix indicator (freeware)

Loet Leydesdorff loet at LEYDESDORFF.NET
Sun Oct 7 04:03:56 EDT 2012

The Triple-Helix Indicator and its Extension to Four Dimensions:
The Measurement of Configurational Information in More than Two Dimensions

The program th4.exe <>  reads
an input file “data.txt” and generates (or adds to an existing) file
th4.dbf containing probabilistic entropy values and mutual information
values for three and/or four nominal variables. (The source code can be
found here <> .) In a number
of studies (see the reference list) we used the mutual information in three
dimensions as Triple Helix indicator; for example, to measure the reduction
of uncertainty (e.g., Yeung, 2008:59f.; cf. McGill, 1954) in the
interactions between distributions in the geographical dimensions
(addresses), organizational size, and technological capacities of firms
(Lengyel & Leydesdorff, 2011; Leydesdorff et al., 2006; Leydesdorff &
Fritsch, 2006; Leydesdorff & Strand, in press).

Using publications as units of analysis, the focus can be on university,
industry and/or government addresses in co-authorship relations (Kwon et
al., 2012; Leydesdorff, 2003; Leydesdorff & Sun, 2009; Park et al., 2005; Ye
et al., in preparation). A program for examining TH relations on a
case-by-case basis is available at .
(The program th.exe also computes also Krippendorff’s (2009a) IABC→AB, AC,
BC and the redundancy R; T = I - R (Krippendorff, 2009b; Leydesdorff, 2009,

In a number of studies (and in the literature) questions have been raised
about extending the Triple Helix to more than three helices (e.g.,
Carayannis & Campbell, 2009 and 2010; Leydesdorff, 2012). The issue is
urgent since the dimension international versus national was found to be
important as an additional dimension in a number of recent studies (Ye et
al., in preparation). One may wish to appreciate international coauthorship
as a fourth variable (Leydesdorff & Sun, 2009; Kwon et al., 2010) or
“foreign driven investment” in the case of firm data (Lengyel &
Leydesdorff, 2011; Strand & Leydesdorff, in press).

This routine (th4.exe <> ) is
meant to facilitate the computation of these values in the case of large
sets. This version (unlike th.exe) operates on nominal values; for example,
industry codes, the names of regions, classifications; the older routine th.
exe <>  uses numerical values. In the
case of numerical values, one may wish to bin these or dichotomize. For
example, if three addresses are provided of which two are from universities
and one from industry, these U-I relation should be counted as “1”. In
other words, numbers are read as character string by this (!) program.

Input file

Input file is a text file with one case (firm, publication, patent, etc.) on
each line, and five variables. The first variable is a case-identifier; for
example, “firm1” or “id0001”. The second to fifth variable are read as
four nominal variables (including “0” and “1”). If the fifth variable is
missing, all values are set to zero, and the corresponding dimension (“z”)
is not computed. The four dimensions are indicated as w, x, y, and z,
respectively. Each variable on the input file has to be embedded in double
quotation marks, and the variables are delimited with commas. As follows:

“id1”, “1”, “b”, “region1”, “2”

“id2”, “2”, “a”, “region2”, “1”

“id3”, “1”, “a”, “region2”, “2”

“id4”, “1”, “b”, “region5”, “1”

For example, in the case of address information, the second variable may
indicate the presence of a university address (Y/N), the third an industrial
address, etc. In the case of firm data, the second variable may be a size
category (e.g., zero for firms without employees to six for firms with more
than 500 employees), the third variable a technology code (e.g., OECD’s
NACE codes), the third an indication of the region, and the fifth whether
the firm is domestically owned or a subsidiary of a foreign company.

The size of the file is not limited (but < 2 GByte). The input file should
be named “data.txt”. Place no header with variable names at the first line
(because these will be counted as separate categories). Note that typos may
lead to the declaration of an additional class because the program indexes
on the strings. The program and the input have to be placed in the same


The program generates the file th4.dbf if not present in this folder; or if
present, a new record is appended to th4.dbf. This file can be read using
Excel or a similar program. As said, the variables are denoted “w”, “x”,
“y”, and “z”, and the new record contains the uncertainties in these
four dimensions (Hw, Hx, Hy, Hz), the joint entropies (such Hwx, Hwxy,
Hwxyz, etc.), and all possible transmissions (Twx, Twxy, Twxyz, etc.) among


The current version is very much a beta-version. Please, provide feedback
for further improvements if bugs are encountered. Carefully check the output
on errors!

I acknowledge Balazs Lengyel for helping to develop this routine.

** apologies for cross-postings


Loet Leydesdorff

Professor, University of Amsterdam
Amsterdam School of Communications Research (ASCoR)
Kloveniersburgwal 48, 1012 CX Amsterdam.
Tel. +31-20-525 6598; fax: +31-842239111

 <mailto:loet at> loet at ;
Visiting Professor,  <> ISTIC,
Beijing; Honorary Fellow,  <> SPRU, University
of Sussex;  <>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the SIGMETRICS mailing list