Perl，搜索字符串，用於發生數組的項目

對於文件篩選器，我想使用一個單詞數組，其中的行將被檢查，如果它們匹配任何單詞。Perl，搜索字符串，用於發生數組的項目

我已經有這個一個相當簡單的方法（僅基本匹配的部分）：

# check if any of the @words is found in $term 

@words= qw/one 
two 
three/; 
$term= "too for the show"; 

# the following looks very C like 

$size= @words; 
$found= 0; 

for ($i= 0; $i<$size && !$found; $i++) { 
    $found|= $term=~ /$words[$i]/; 
} 

printf "found= %d\n", $found;

我們已經看到了很多在Perl神祕的語法和解決方案，我不知道是否（或相當的）是更緊湊的寫作方式。

來源

2016-10-09 Terminality

創建一個從所有的字正則表達式，只是做一個匹配：

#!/usr/bin/perl 
use warnings; 
use strict; 

my @words = qw(one two three); 

my $regex = join '|', map quotemeta, @words; 

for my $term ('too for the show', 'five four three', 'bones') { 
    my $found = $term =~ $regex; 
    printf "found = %d\n", $found; 
}

匹配/\b(?:$regex)\b/會阻止bones從匹配one。

來源

2016-10-09 22:26:18 choroba

['從數據list2re' :: Munge時間（https://metacpan.org/pod/Data::Munge#list2re-LIST）不會的東西非常相似，但也可以處理一些邊緣情況。來自OP的 – melpomene

+1。我喜歡這個（這種類型顯示了我期待的Perl的aracane方式）。但是裝配的那個更適合我的需求。謝謝回答。 – Terminality

使用Regexp::Assemble將搜索轉換爲一個正則表達式。這樣，每個字符串只需要掃描一次，使其對大量行更有效。

Regexp :: Assemble比手動更好。它有一個完整的API，你可能想用這樣一個正則表達式來完成，它可以處理邊界情況，並且可以智能地編譯成更有效的正則表達式。

例如，該程序產生(?^:\b(?:t(?:hree|wo)|one)\b)這將導致較少的回溯。隨着你的單詞列表增加，這變得非常重要。 Perl的最新版本，大約5.14及更高版本，將爲您執行此操作。

use strict; 
use warnings; 
use v5.10; 

use Regexp::Assemble; 

# Wrap each word in \b (word break) so only the full word is 
# matched. 'one' will match 'money' but '\bone\b' won't. 
my @words= qw(
    \bone\b 
    \btwo\b 
    \bthree\b 
); 

# These lines simulate reading from a file. 
my @lines = (
    "won for the money\n", 
    "two for the show\n", 
    "three to get ready\n", 
    "now go cat go!\n" 
); 

# Assemble all the words into one regex. 
my $ra = Regexp::Assemble->new; 
$ra->add(@words); 

for my $line (@lines) { 
    print $line if $line =~ $ra; 
}

還要注意foreach style loop to iterate over an array，並使用statement modifier的。

最後，我用\b來確保只有實際的字符匹配，而不是像money這樣的子字符串。

來源

2016-10-09 22:27:14 Schwern

現代版本的perl會將'one | two | three'編譯成內部結構，不會回溯。 – melpomene

@melpomene是的，值得一提。謝謝。這是一個非常好的優化，但現在你對Perl版本有一個效率依賴性（我想說[這個穩定在5.14左右]（http://perldoc.perl.org/5.14.0/perldelta.html#Regular-表達式錯誤修復）？）我不確定它可以處理的正則表達式有多複雜。我不會依賴它，因爲它的性能嚴重依賴於正則表達式優化。而Regexp :: Assemble解決了很多其他問題，它仍然值得。 – Schwern

我已經要求5.10以上其他的東西，所以這對我來說並不是什麼大問題，但要點。 – melpomene

這可能是一個過於簡單的「C」類似代碼到Perl的「翻譯」。

臨：它的結構緊湊
缺點：這不是很有效的（其他的答案是一噸更好地在這裏）。

@words= qw/one 
two 
three/; 
$term= "too for the show"; 

my @found = grep { $term =~ /$_/; } @words; 

printf "found= %d\n", scalar @found;

來源

2016-10-09 22:30:52 Tibrogargan

如果你只需要一個計數，'my $ count = grep {$ term =〜/ $ _ /} @ words'也可以。來自OP的 – melpomene

+1。我喜歡這個（這種類型顯示了我期待的Perl的aracane方式）。但是裝配的那個更適合我的需求。謝謝回答。 – Terminality

Perl，搜索字符串，用於發生數組的項目

回答

相關問題