計算perl中每個行位置每個字符的出現次數

與問題 unix - count occurrences of character per line/field 類似，但對於行上每個位置上的每個字符。計算perl中每個行位置每個字符的出現次數

每1E7線鑑於〜的文件500的字符，我想的二維摘要結構等 $摘要{ 'A'， 'B'， 'C'， '0'， '1'，」 2'} [pos 0..499] = count_integer 它顯示每個字符在每行中的使用次數。任何一個維度的順序都很好。

我的第一種方法那樣++摘要{炭} [POS]在閱讀，但因爲許多線是相同的，它快得多計數相同的行第一，然後總結總結{炭} [POS] + = n一次

是否有更多的慣用或更快的方式比以下C型二維循環？

#!perl 
my (%summary, %counthash); # perl 5.8.9 

sub method1 { 
    print "method1\n"; 
    while (<DATA>) { 
     my @c = split(// , $_); 
     ++$summary{ $c[$_] }[$_] foreach (0 .. $#c); 
    } # wend 
} ## end sub method1 

sub method2 { 
    print "method2\n"; 
    ++$counthash{$_} while (<DATA>); # slurpsum the whole file 

    foreach my $str (keys %counthash) { 
     my $n = $counthash{$str}; 
     my @c = split(//, $str); 
     $summary{ $c[$_] }[$_] += $n foreach (0 .. $#c); 
    } #rof my $str 
} ## end sub method2 

# MAINLINE 
if (rand() > 0.5) { &method1 } else { &method2 } 
print "char $_ : @{$summary{$_}} \n" foreach ('a', 'b'); 
# both methods have this output summary 
# char a : 3 3 2 2 3 
# char b : 2 2 3 3 2 
__DATA__ 
aaaaa 
bbbbb 
aabba 
bbbbb 
aaaaa

來源

2015-12-07 jgraber

很難用這些示例數據直觀地查看您要查找的內容 - 我認爲您的場景不像重複字符的線條那麼平凡？另外：'嚴格使用;使用警告;'是一個非常好的主意。 – Sobrique

我看到的唯一的低效率/非慣用性（？）是，您還要計算所有行終止字符（換行符和/或CR）。（除非你有所作爲，否則Perl將它們包含在'$ _'中。）在讀取每個''後，粘貼一個'chomp;'。 –

@JeffY：* unidiomaticity *，我相信 – Borodin

根據網站資料的方法2形成可能是有點快於或慢的方法1.

但一個很大的區別是使用解壓，而不是分裂。

use strict; 
use warnings; 
my (%summary, %counthash); # perl 5.8.9 

sub method1 { 
    print "method1\n"; 
    my @l= <DATA>; 
    for my $t(1..1000000) { 
     foreach (@l) { 
      my @c = split(// , $_); 
      ++$summary{ $c[$_] }[$_] foreach (0 .. $#c); 
     }  
    } # wend 
} ## end sub method1 

sub method2 { 
    print "method2\n"; 
    ++$counthash{$_} while (<DATA>); # slurpsum the whole file 
    for my $t(1..1000000) { 
     foreach my $str (keys %counthash) { 
      my $n = $counthash{$str}; 
      my $i = 0; 
      $summary{ $_ }[$i++] += $n foreach (unpack("c*",$str)); 
     }  
    } 
} ## end sub method2 

# MAINLINE 
#method1(); 
method2(); 
print "char $_ : ". join (" ", @{$summary{ord($_)}}). " \n" 
    foreach ('a', 'b'); 
# both methods have this output summary 
# char a : 3 3 2 2 3 
# char b : 2 2 3 3 2 
__DATA__ 
aaaaa 
bbbbb 
aabba 
bbbbb 
aaaaa

運行速度更快。（而不是我的個人電腦上的7.x秒）

來源

2015-12-08 09:26:54

你測試了嗎？ {unpack（「c *」，$ str）}會生成98和97的錯誤摘要鍵，而不是'a'和'b'; 'a *'不起作用;這個工作：$ summary {$ _} [$ i ++] + = $ n foreach（unpack（'a'x length（$ str），$ str））;這也工作$ summary {chr（$ _）} [$ i ++] + = $ n foreach（unpack（'c *'，$ str））; – jgraber

$ summary {substr（$ str，$ _，1）} [$ _] + = $ n foreach（0 ..（length（$ str）-1））; ＃等於很快 – jgraber

@jgrabber是的，我做了，它的工作。解壓縮只是返回字符串的反轉，所以在我的代碼中，我打印sumary {ord（$ _）}，你可能已經注意到了... 但是..長度和子字符串的解決方案更快。原始代碼（執行一百萬次）在我的電腦上花費7.177秒，解壓縮解包需要5.879秒，長度和子串的解決方案只需要4.286秒。 –

計算perl中每個行位置每個字符的出現次數

回答

相關問題