[Sigia-l] Common word compilation app suggestion
Andrew McNaughton
andrew at scoop.co.nz
Thu Jun 13 15:01:10 EDT 2002
On Thu, 13 Jun 2002, Sean Lawrence wrote:
> Does anyone know of a utility or application that can go through a document
> and find comminly used words? I'd imagine there is a way to do it with grep
> but I'm not quite talented enough to create the proper string to do that.
This perl script might be all you need:
----- wordcount.pl -------
#!/usr/bin/perl
use strict;
#count words from stdin
my %count;
while (<>) {
my @words = m/([a-z][a-z']*[a-z]|[a-z])/ig;
foreach my $word (@words) {
$count{lc $word}++;
}
}
# list frequencies to stdout
foreach my $word (sort {$count{$b} <=>
$count{$a}} keys %count) {
print "$word: $count{$word}\n";
}
--------------------------
This script defines words as any sequence of latin letters which may be
broken by an apostrophe (eg they're) but no other character (eg
pseudo-science is treated as two words).
Andrew McNaughton
More information about the Sigia-l
mailing list