如何這可以在更多的Perl的完成方式

我在Perl新手和我的功課之一，我想出了這樣一個解決方案：如何這可以在更多的Perl的完成方式

#wordcount.pl FILE 
    # 

    #if no filename is given, print help and exit 
    if (length($ARGV[0]) < 1) 
    { 
      print "Usage is : words.pl word filename\n"; 
      exit; 
    } 

    my $file = $ARGV[0];   #filename given in commandline 

    open(FILE, $file);   #open the mentioned filename 
    while(<FILE>)     #continue reading until the file ends 
    { 
      chomp; 
      tr/A-Z/a-z/;   #convert all upper case words to lower case 
      tr/.,:;!?"(){}//d;   #remove some common punctuation symbols 
      #We are creating a hash with the word as the key. 
      #Each time a word is encountered, its hash is incremented by 1. 
      #If the count for a word is 1, it is a new distinct word. 
      #We keep track of the number of words parsed so far. 
      #We also keep track of the no. of words of a particular length. 

      foreach $wd (split) 
      { 
       $count{$wd}++; 
       if ($count{$wd} == 1) 
       { 
         $dcount++; 
       } 
       $wcount++; 
       $lcount{length($wd)}++; 
      } 
    } 

    #To print the distinct words and their frequency, 
    #we iterate over the hash containing the words and their count. 
    print "\nThe words and their frequency in the text is:\n"; 
    foreach $w (sort keys%count) 
    { 
     print "$w : $count{$w}\n"; 
    } 

    #For the word length and frequency we use the word length hash 
    print "The word length and frequency in the given text is:\n"; 
    foreach $w (sort keys%lcount) 
    { 
     print "$w : $lcount{$w}\n"; 
    } 

    print "There are $wcount words in the file.\n"; 
    print "There are $dcount distinct words in the file.\n"; 

    $ttratio = ($dcount/$wcount)*100;  #Calculating the type-token ratio. 

    print "The type-token ratio of the file is $ttratio.\n";

我已經包含了評論提什麼確實。其實我必須從給定的文本文件中找到字數。上述程序的輸出將如下所示：

The words and their frequency in the text is: 
1949 : 1 
a : 1 
adopt : 1 
all : 2 
among : 1 
and : 8 
assembly : 1 
assuring : 1 
belief : 1 
citizens : 1 
constituent : 1 
constitute : 1 
. 
. 
. 
The word length and frequency in the given text is: 
1 : 1 
10 : 5 
11 : 2 
12 : 2 
2 : 15 
3 : 18 
There are 85 words in the file. 
There are 61 distinct words in the file. 
The type-token ratio of the file is 71.7647058823529.

即使在Google的幫助下，我也可以找到我作業的解決方案。不過，我認爲使用Perl的真正威力將會有一個小而簡潔的代碼。任何人都可以用更少的代碼行給我一個Perl解決方案嗎？

來源

2011-10-09 sriram

根據您的使用情況報表，文件名是第二個參數。這與您的代碼相矛盾。 –

建議之一是：不要明確使用open。只需使用<>。 Perl會將ARGV中的每個參數解釋爲一個文件名，並且<>將從中讀取。 –

@WilliamPursell：是文件名是第二個參數.. – sriram

這裏有幾個建議：

包括在你的Perl腳本use strict和use warnings。
您的參數驗證不在測試應該測試的內容：（1）@ARGV中是否只有1個項目，以及（2）該項目是否是有效的文件名。
儘管每個規則都有例外情況，但將<>的回報指定給指定變量而不是依靠$_通常是一種很好的做法。如果循環內的代碼可能需要使用Perl的結構之一是還依賴於$_（例如，map，grep，或後修復for循環）
```
while (my $line = <>){ 
    ... 
} 
```
Perl提供了內置尤爲如此-in函數（lc）爲小寫字符串。
您正在行讀取循環內執行不必要的計算。如果你只是建立一個單詞的記錄，你將擁有所有你需要的信息。還要注意，Perl爲其大多數控制結構（for,while, if等）提供了一個單行表單，如下所示。
```
while (my $line = <>){ 
    ... 
    $words{$_} ++ for split /\s+/, $line; 
} 
```
然後，您可以使用單詞計算來計算您需要的其他信息。例如，唯一字的數量只是散列中的鍵的數量，字的總數是散列值的總和。

字長的分佈可以計算如下：

my %lengths; 
$lengths{length $_} += $words{$_} for keys %words;

來源

2011-10-09 14:54:26 FMc

後綴循環是語法中可憎的語言＃1語法憎惡 – Nemo

ohhh！看起來像Perl cookbook類的例子:)我有一些疑問，'split */+ +，$ line;'$ words {$ _} ++'這到底是什麼？我無法弄清楚爲什麼在這種方式中使用'$ words {$ _}'，究竟是什麼'$ _'？ – sriram

@GroovyUser它只是'for（split/\ s + /，$ line）{$ words {$ _} ++}'的縮寫形式，其中'$ _'是一個單獨的單詞。 – FMc

使用像你這樣的散列是一個很好的方式去做。解析文件的更多方法是使用帶有/ g標誌的正則表達式來讀取行中的單詞。 \w+表示一個或多個字母數字。

while(<FILE>) 
{ 
    while(/(\w+)/g) 
    { 
     my $wd = lc($1); 
     ... 

    } 
}

來源

2011-10-09 14:38:59 Sodved

如何這可以在更多的Perl的完成方式

回答

相關問題