2012-06-27 157 views
2

腳本的目的是處理文件中的所有單詞,並輸出出現次數最多的所有單詞。所以如果有三個單詞出現10次,程序應該輸出所有的單詞。Perl腳本問題

腳本現在運行,這要歸功於我在這裏得到的一些提示。但是,它不處理大型文本文件(即新約)。我不確定這是我的錯,還是隻是代碼的限制。我相信該計劃還有其他一些問題,所以任何幫助將不勝感激。

#!/usr/bin/perl -w 
require 5.10.0; 

print "Your file: " . $ARGV[0] . "\n"; 
#Make sure there is only one argument 
if ($#ARGV == 0){ 

    #Make sure the argument is actually a file 
    if (-f $ARGV[0]){ 

     %wordHash =();  #New hash to match words with word counts 
     $file=$ARGV[0];  #Stores value of argument 
     open(FILE, $file) or die "File not opened correctly."; 

     #Process through each line of the file 
     while (<FILE>){ 
      chomp; 
      #Delimits on any non-alphanumeric 
      @words=split(/[^a-zA-Z0-9]/,$_); 
      $wordSize = @words; 

      #Put all words to lowercase, removes case sensitivty 
      for($x=0; $x<$wordSize; $x++){ 
       $words[$x]=lc($words[$x]); 
      } 

      #Puts each occurence of word into hash 
      foreach $word(@words){ 
       $wordHash{$word}++; 
      } 
     } 
     close FILE; 

     #$wordHash{$b} <=> $wordHash{$a}; 
     $wordList=""; 
     $max=0; 

     while (($key, $value) = each(%wordHash)){ 
      if($value>$max){ 
       $max=$value; 
      } 
      } 

     while (($key, $value) = each(%wordHash)){ 
      if($value==$max && $key ne "s"){ 
       $wordList.=" " . $key; 
      } 
      }  

     #Print solution 
     print "The following words occur the most (" . $max . " times): " . $wordList . "\n"; 
    } 
    else { 
     print "Error. Your argument is not a file.\n"; 
    } 
} 
else { 
    print "Error. Use exactly one argument.\n"; 
} 
+2

請使用編譯腳本中的 –

+0

考慮HTTP「使用嚴格」:// WWW .66clouds.com/new_testament.html;) –

回答

6

你的問題在你的腳本的頂部在於兩名失蹤線:

use strict; 
use warnings; 

如果他們在那裏,他們會報道很多線像這樣:

Argument "make" isn't numeric in array element at ...

它來源於此行:

$list[$_] = $wordHash{$_} for keys %wordHash; 

數組元素只能是數字,並且由於您的鍵是單詞,所以不起作用。這裏發生的是任何隨機字符串被強制爲一個數字,並且對於任何不以數字開頭的字符串,這將是0

您的代碼可以正常讀取數據,但我會以不同的方式寫入數據。只有在這之後,你的代碼才變得笨拙。

儘可能靠近我可以告訴,你要打印出最出現的單詞,在這種情況下,你應該考慮下面的代碼:

use strict; 
use warnings; 

my %wordHash; 
#Make sure there is only one argument 
die "Only one argument allowed." unless @ARGV == 1; 
while (<>) { # Use the diamond operator to implicitly open ARGV files 
    chomp; 
    my @words = grep $_,   # disallow empty strings 
     map lc,     # make everything lower case 
      split /[^a-zA-Z0-9]/; # your original split 
    foreach my $word (@words) { 
     $wordHash{$word}++; 
    } 
} 

for my $word (sort { $wordHash{$b} <=> $wordHash{$a} } keys %wordHash) { 
    printf "%-6s %s\n", $wordHash{$word}, $word; 
} 

正如你會注意到,您可以根據排序散列值。

1

這裏是寫它(我可能也說:「Perl是不是C」)的完全不同的方式:

#!/usr/bin/env perl 

use 5.010; 
use strict; use warnings; 
use autodie; 

use List::Util qw(max); 

my ($input_file) = @ARGV; 
die "Need an input file\n" unless defined $input_file; 

say "Input file = '$input_file'"; 

open my $input, '<', $input_file; 

my %words; 

while (my $line = <$input>) { 
    chomp $line; 

    my @tokens = map lc, grep length, split /[^A-Za-z0-9]+/, $line; 
    $words{ $_ } += 1 for @tokens; 
} 

close $input; 

my $max = max values %words; 
my @argmax = sort grep { $words{$_} == $max } keys %words; 

for my $word (@argmax) { 
    printf "%s: %d\n", $word, $max; 
}