2010年6月11日金曜日

Perlで単語をカウント

C言語などで単語のカウントを行う為には複雑なプログラムが必要です。 しかし。Perlを使えば目を疑う程簡単にプログラムできます。方法は
  1. 正規表現を使いマッチングを行う
  2. マッチングの結果を配列へ設定
  3. マッチングされた単語をキーにしてハッシュ配列をカウントアップ
  4. 単語と使われた頻度を表示
です。 詳しくは以下のプログラムを参考にして下さい。

#!/usr/bin/perl

use utf8;


#サンプルテキスト、リンカーンの演説
my $data = <<EO_TEXT;
Fourscore and seven years ago our fathers brought forth
 on this continent a new nation, conceived in liberty, and
 dedicated to the proposition that all men are created equal.
 Now we are engaged in a great civil war, testing
 whether that nation, or any nation so conceived and so
 dedicated, can long endure. We are met on a great battle-
 field of that war. We have come to dedicate a portion of
 that field as a final resting-place for those who here gave
 their lives that this nation might live. It is altogether
 fitting and proper that we should do this.
 But, in a larger sense, we cannot dedicate…we cannot
 consecrate…we cannot hallow…this ground. The brave men,
 living and dead, who struggled here, have consecrated it
 far above our poor power to add or detract. The world
 will little note nor long remember what we say here, but
 it can never forget what they did here. It is for us, the
 living, rather, to be dedicated here to the unfinished
 work which they who fought here have thus far so nobly
 advanced. It is rather for us to be here dedicated to the
 great task remaining before us that from these honored
 dead we take increased devotion to that cause for which
 they gave the last full measure of devotion; that we here
 highly resolve that these dead shall not have died in vain;
 that this nation, under God, shall have a new birth of
 freedom; and that government of the people, by the people,
 for the people, shall not perish from the earth.
EO_TEXT

#正規表現(単語を抽出する)
$regex = "[^ \n\r\t.,;]+";

#単語を配列へ代入する
@temp = $data =~m/$regex/g;

#ハッシュ配列を初期化
%words = ();

#単語を全てスキャン、そして単語をキーにして同じ単語をカウントする
foreach(@temp){
$words{$_}++;

}

#単語と使われた頻度を表示する
foreach $key ( keys( %words ) ) {
print "$key ($words{$key})\n";
}

実行結果

birth (1)
these (2)
forth (1)
which (2)
civil (1)
come (1)
far (2)
power (1)
what (2)
honored (1)
freedom (1)
consecrated (1)
poor (1)
living (2)
final (1)
continent (1)
increased (1)
but (1)
and (6)
of (5)
is (3)
all (1)
men (2)
nor (1)
will (1)
have (5)
it (2)
last (1)
can (2)
ground (1)
a (7)
thus (1)
might (1)
in (4)
cause (1)
perish (1)
liberty (1)
full (1)
by (1)
died (1)
gave (2)
brave (1)
The (2)
they (3)
add (1)
engaged (1)
war (2)
remaining (1)
us (3)
Now (1)
under (1)
task (1)
note (1)
created (1)
brought (1)
hallow…this (1)
as (1)
highly (1)
portion (1)
testing (1)
It (3)
battle- (1)
larger (1)
cannot (3)
not (2)
that (12)
take (1)
struggled (1)
on (2)
our (2)
shall (3)
years (1)
who (3)
unfinished (1)
altogether (1)
Fourscore (1)
fitting (1)
people (3)
consecrate…we (1)
proposition (1)
conceived (2)
here (8)
did (1)
do (1)
we (6)
seven (1)
advanced (1)
nobly (1)
to (8)
But (1)
from (2)
God (1)
proper (1)
dedicated (4)
devotion (2)
dedicate…we (1)
endure (1)
any (1)
remember (1)
work (1)
ago (1)
little (1)
live (1)
sense (1)
the (9)
met (1)
resolve (1)
vain (1)
earth (1)
or (2)
new (2)
field (2)
this (4)
before (1)
government (1)
so (3)
for (5)
fought (1)
those (1)
We (2)
rather (2)
their (1)
be (2)
whether (1)
nation (5)
detract (1)
world (1)
are (3)
forget (1)
never (1)
dead (3)
resting-place (1)
above (1)
long (2)
great (3)
should (1)
say (1)
equal (1)
lives (1)
dedicate (1)
measure (1)
fathers (1)

0 件のコメント:

コメントを投稿