2014-09-26 30 views
1

我想通過使用數組元素來匹配數據上的缺席單詞。 我的代碼是如何匹配perl中缺少數組元素?

use warnings; 
use strict; 
my @ar = qw(one two three four five six seven eight nine ten); 
my @data = <DATA>; 
print "Absence word in the data\n"; 
foreach my $mat(@ar){ 
    my $nonmatch; 
    foreach my $dat (@data){ 
     $nonmatch = grep{m/(?!$mat)/} $dat; 
    } 
    print "$nonmatch\n"; 
} 
__DATA__ 
eight two four one two three four seven eight ten one two seven 

首先參閱數據陣列元件上的陣列元素的值是在僅打印在數據不存在。

我預期成果是:

Absence word in the data 
five 
six 
nine 

我該怎麼辦呢

+2

對於'@ data'中的單詞使用散列,這樣就可以檢查散列中是否存在$ mat。 – 2014-09-26 17:12:16

回答

2

使用見過風格散列作爲藍本在perlfaq4 - How can I tell whether a certain element is contained in a list or array?

use warnings; 
use strict; 

my %seen = map { $_ => 1 } map { split ' ' } <DATA>; 

my @ar = qw(one two three four five six seven eight nine ten); 

print "Absence word in the data\n"; 
print "$_\n" for grep { !$seen{$_} } @ar; 

__DATA__ 
eight two four one two three four seven eight ten one two seven 

輸出:

Absence word in the data 
five 
six 
nine 
+1

你需要'chomp'你的''?或者,拆分是否會在最後刪除NL? – 2014-09-29 04:08:09

+0

不需要'chomp'。 'split'或'split'''是特殊的情況,被視爲'split/\ s + /'的特點,除去任何前面的間距。分割後可能會有一個尾隨的'''',但前提是我們使用了一個負數限制。 – Miller 2014-09-29 04:46:00

1

您可以使用哈希片@seen{@r}存儲在%seen哈希從@r所有見過的詞,檢查後對@ar這些哈希鍵陣列,

use warnings; 
use strict; 

my @ar = qw(one two three four five six seven eight nine ten); 
my %seen; 
while (my $mat = <DATA>) { 
    my @r = split (' ', $mat); 
    @seen{@r} =(); 
} 
print "Absence word in the data\n"; 
print "$_\n" for grep { not exists $seen{$_} } @ar; 

__DATA__ 
eight two four one two three four seven eight ten one two seven 

輸出

Absence word in the data 
five 
six 
nine 
+0

請在投票時進行評論。 – 2014-09-26 18:21:41

+0

謝謝@mpapec這是在工作請解釋你的代碼。我沒有想到宣佈hasesh? – mkHun 2014-09-26 19:02:43

1

這聽起來像一個問題,我曾在一個點上,我想出了代碼,是我創建的基礎上,在此頁面中的信息下面的代碼:

https://www.safaribooksonline.com/library/view/perl-cookbook/1565922433/ch04s08.html

# assume @A and @B are already loaded 
%seen =();      # lookup table to test membership of B 
@aonly =();      # answer 

# build lookup table 
foreach $item (@B) { $seen{$item} = 1 } 

# find only elements in @A and not in @B 
foreach $item (@A) { 
    unless ($seen{$item}) { 
     # it's not in %seen, so add to @aonly 
     push(@aonly, $item); 
    } 
} 
1

創建一個散列,其中包含__DATA__中的所有單詞作爲關鍵字(可以使用散列片在一行中完成),然後過濾未散列的單詞(也可以使用grep在一行中完成)。

use warnings; 
use strict; 
my @ar = qw(one two three four five six seven eight nine ten); 

my $data = join '', (<DATA>); 
my @data_words = split ' ', $data; # get a list of words 

my %data; 
@data{@data_words} = @data_words; # fill a hash with the words from __DATA__ 

my @missing = grep { !exists $data{$_}; } @ar; # filter words 

print "Absence word in the data: @missing\n"; 

__DATA__ 
eight two four one two three four seven eight ten one two seven 
0

該解決方案從您正在查找的物品列表開始,然後丟棄沿途所看到的任何物品,然後打印出剩下的物品。

如果在%unseen哈希中仍然有任何密鑰,您可以通過檢查while循環的底部來優化大數據。我在測試數據中添加了另一行,並添加了「16」這個詞,以確保它能夠處理多行,並且我們沒有在那裏得到「六」的誤報。

use warnings; 
use strict; 

my @to_match = qw/ one two three four five six seven eight nine ten /; 
my %unseen; 
$unseen{$_} = 1 for @to_match; 
while (my $line = <DATA>) { 
    foreach my $match_this (@to_match) { 
     delete $unseen{$match_this} if $line =~/\b$match_this\b/; 
    } 
} 
print "Words absent from the data:\n". join "\n", keys %unseen; 
print "\n"; 
__DATA__ 
eight two four one two three four seven eight ten one two seven 
sixteen 
+0

請在評論時進行評論 - 這種解決方案是一種有用的可能優化方法 - 我沒有進行廣泛的測試,但我相信它確實有效。 – msouth 2014-09-26 21:48:19

1

兩件事情:

始終chomp你讀什麼這包括__DATA__

my @data = <DATA>; # The NL is in each element 
chomp @data;   # Now it isn't! 

如果你不chomp,你會檢查看看one匹配one\n。此外,由於您將整個__DATA__放在一行上,因此它將作爲單行輸入讀取。您將不得不使用split將其分隔到數組中。

第二件事:通常,當你問這是在嗎?t類型的問題,你應該立即想到哈希。散列可以很快用於查找項目。在這種情況下,你會做你的數據的哈希值,然後驗證是否在你的列表中的每個項目在散列:

#! /usr/bin/env perl 
# 

use strict; 
use warnings; 
use feature qw(say); 

my @list = qw(one two three four five six seven eight nine ten); 
my @data = <DATA>; 
chomp @data;  # Don't forget! 

# 
# Translate your input as a hash 
# 

my %data_hash; 
for my $element (@data) { 
    $data_hash{$element} = 1; 
} 

for my $element (@list) { 
    if (not exists $data_hash{$element}) { 
     say "$element isn't in the list"; 
    } 
} 
__DATA__ 
eight 
two 
four 
one 
two 
three 
four 
seven 
eight 
ten 
one 
two 
seven 

注意,map命令給你寫這個循環的一個短的方法:

# 
# Translate your input as a hash 
# 

my %data_hash; 
for my $element (@data) { 
    $data_hash{$element} = 1; 
} 

現在可以縮短爲單行:

# 
# Translate your input as a hash 
# 

my %data_hash = map { $_ => 1 } @data; 

這是翻動數組的哈希值的常見方式,因此大多數開發人員會簡單地使用它。