如何在我處理後從Perl數組中刪除元素？

我正在讀取一個postfix郵件日誌文件到一個數組，然後通過循環來提取消息。在第一遍時，我正在檢查「to =」行上的匹配並獲取消息ID。在構建MSGID數組後，我循環遍歷數組以提取to =，from =和client =行中的信息。如何在我處理後從Perl數組中刪除元素？

我希望做的是從數組中提取數據以便快速處理數據（即少一行檢查），從數組中刪除一行。

有什麼建議嗎？這是在Perl中。

編輯：下面gbacon的答案是足以讓我的固溶體滾動。下面是它的膽量：

my %msg; 
while (<>) { 
    my $line = $_; 
    if (s!^.*postfix/\w+\[.+?\]: (\w+):\s*!!) { 
      my $key = $1; 
      push @{ $msg{$key}{$1} } => $2 
        while /\b(to|from|client|size|nrcpt)=<?(.+?)(?:>|,|\[|$)/g; 
    } 
    if ($line =~ s!^(\w+ \d+ \d+:\d+:\d+)\s(\w+.*)\s+postfix/\w+\[.+?\]: (\w+):\s*removed!!) { 
      my $key = $3; 
      push @{ $msg{$key}{date} } => $1; 
      push @{ $msg{$key}{server} } => $2; 
    } 
} 

use Data::Dumper; 
$Data::Dumper::Indent = 1; 
print Dumper \%msg;

我敢肯定，第二正則表達式可以更令人印象深刻，但它得到了什麼，我需要做的工作。我現在可以把所有消息的哈希值抽出來，並將我感興趣的消息抽出。

感謝所有回答。

來源

2010-02-03 Justin ᚅᚔᚈᚄᚒᚔ

在我看來，哈希可能是一個更好的方式來處理這個問題？這樣，您不必在迭代時明確檢查匹配。您可以簡單地使用「to =」行作爲關鍵。 – 2010-02-03 19:38:02

做它在單次通過：

#! /usr/bin/perl 

use warnings; 
use strict; 

# for demo only 
*ARGV = *DATA; 

my %msg; 
while (<>) { 
    if (s!^.*postfix/\w+\[.+?\]: (\w+):\s*!!) { 
    my $key = $1; 
    push @{ $msg{$key}{$1} } => $2 
     while /\b(to|from|client)=(.+?)(?:,|$)/g; 
    } 
} 

use Data::Dumper; 
$Data::Dumper::Indent = 1; 
print Dumper \%msg; 
__DATA__ 
Apr 8 14:22:02 MailSecure03 postfix/smtpd[32388]: BA1CE38965: client=mail.example.com[x.x.x.x] 
Apr 8 14:22:03 MailSecure03 postfix/cleanup[32070]: BA1CE38965: message-id=<[email protected]> 
Apr 8 14:22:03 MailSecure03 postfix/qmgr[19685]: BA1CE38965: from=<[email protected]>, size=1087, nrcpt=2 (queue active) 
Apr 8 14:22:04 MailSecure03 postfix/smtp[32608]: BA1CE38965: to=<[email protected]>, relay=127.0.0.1[127.0.0.1]:10025, delay=1.7, delays=1/0/0/0.68, dsn=2.0.0, status=sent (250 OK, sent 49DC509B_360_15637_162D8438973) 
Apr 8 14:22:04 MailSecure03 postfix/smtp[32608]: BA1CE38965: to=<[email protected]>, relay=127.0.0.1[127.0.0.1]:10025, delay=1.7, delays=1/0/0/0.68, dsn=2.0.0, status=sent (250 OK, sent 49DC509B_360_15637_162D8438973) 
Apr 8 14:22:04 MailSecure03 postfix/qmgr[19685]: BA1CE38965: removed 
Apr 8 14:22:04 MailSecure03 postfix/smtpd[32589]: 62D8438973: client=localhost.localdomain[127.0.0.1] 
Apr 8 14:22:04 MailSecure03 postfix/cleanup[32080]: 62D8438973: message-id=<[email protected]> 
Apr 8 14:22:04 MailSecure03 postfix/qmgr[19685]: 62D8438973: from=<[email protected]>, size=1636, nrcpt=2 (queue active) 
Apr 8 14:22:04 MailSecure03 postfix/smtp[32417]: 62D8438973: to=<[email protected]>, relay=y.y.y.y[y.y.y.y]:25, delay=0.19, delays=0.04/0/0.04/0.1, dsn=2.6.0, status=sent (250 2.6.0 <[email protected]om> Queued mail for delivery) 
Apr 8 14:22:04 MailSecure03 postfix/smtp[32417]: 62D8438973: to=<[email protected]>, relay=y.y.y.y[y.y.y.y]:25, delay=0.19, delays=0.04/0/0.04/0.1, dsn=2.6.0, status=sent (250 2.6.0 <[email protected]> Queued mail for delivery) 
Apr 8 14:22:04 MailSecure03 postfix/qmgr[19685]: 62D8438973: removed

代碼工作由第一尋找一個隊列ID（例如，BA1CE38965和62D8438973以上），這是我們在$key存儲。

接下來，我們找到當前行上的所有匹配（感謝/g開關），看起來像to=<...>，client=mail.example.com等等 - 帶和不帶分隔逗號。

在圖案

值得注意的是

\b - 匹配to或from或client
(.+?) - - 該字段的值與相匹配的字邊界只（防止匹配xxxto=<...>）
(to|from|client)上匹配非貪婪的量詞
(?:,|$) - 匹配逗號或字符串結尾從捕獲到$3

非貪婪(.+?)迫使比賽停止在它遇到，而不是最後的第一個逗號。否則，在一條線上

to=<[email protected]>, other=123

你會得到<[email protected]>, other=123作爲收件人！

然後對於匹配的每個字段，我們push它將其放到數組的末尾（因爲可能有多個收件人）連接到隊列ID和字段名稱。看看結果：

$VAR1 = { 
    '62D8438973' => { 
    'client' => [ 
     'localhost.localdomain[127.0.0.1]' 
    ], 
    'to' => [ 
     '<[email protected]>', 
     '<[email protected]>' 
    ], 
    'from' => [ 
     '<[email protected]>' 
    ] 
    }, 
    'BA1CE38965' => { 
    'client' => [ 
     'mail.example.com[x.x.x.x]' 
    ], 
    'to' => [ 
     '<[email protected]>', 
     '<[email protected]>' 
    ], 
    'from' => [ 
     '<[email protected]>' 
    ] 
    } 
};

現在說要打印所有的消息，其隊列ID是BA1CE38965收件人：

my $queueid = "BA1CE38965"; 
foreach my $recip (@{ $msg{$queueid}{to} }) { 
    print $recip, "\n": 
}

也許你只想知道有多少收件人：

print scalar @{ $msg{$queueid}{to} }, "\n";

如果你願意承擔每個消息都只有一個客戶端，與

012訪問

來源

2010-02-03 20:00:50

這真是太棒了，謝謝...我只專注於抽出我感興趣的消息（與[0-9 - ] @ ACertainDomain.com相匹配的消息），並沒有考慮只加載所有將文件中的相關信息轉換爲散列，然後將消息從中拉出。我打算用你的代碼作爲基礎，看看我不能從那裏建立起來。我相信我會有更多的問題（我仍然試圖解析這個'雖然'正則表達式，我在這個生鏽的）。 – 2010-02-03 21:04:06

@Justin不客氣！查看更新的說明。 – 2010-02-03 21:33:55

再次感謝。我的解析現在每個文件大約需要3分鐘，而不是3個小時。這個社區真棒。 – 2010-02-03 23:59:03

它實際上並不會使處理速度更快，因爲從陣列中移除是一項昂貴的操作。

更好的選擇：

當你創建ID數組
做的一切，包括指針（索引，真的）到主存儲器陣列，讓您可以快速訪問它的元素爲給定ID

來源

2010-02-03 19:31:35

在Perl中，您可以使用splice（）例程從數組中刪除元素。

像往常一樣，在數組循環時從數組中刪除時要小心，因爲數組索引將發生更改。

來源

2010-02-03 19:32:50

假設你已經在手的索引，使用拼接：

splice(@array, $indextoremove, 1)

但要小心。刪除元素後，您的索引將無效。

來源

2010-02-03 19:34:29

用於操縱一個數組的內容常用方法：

# start over with this list for each example: 
my @list = qw(a b c d);

剪接：

splice @list, 2, 1, qw(e); 
# @list now contains: qw(a b e d)

彈出和不印字：

pop @list; 
# @list now contains: qw(a b c) 

unshift @list; 
# @list now contains: qw(b c d)

地圖：

@list = map { $_ eq 'b' ?() : $_ } @list; 
# list now contains: qw(a c d);

陣列片：

@list[3..4] = qw(e f); 
# list now contais: qw(a b c e f);

爲和的foreach循環：

foreach (@list) 
{ 
    # $_ is aliased to each element of the list in turn; 
    # assignments will be propogated back to the original structure 
    $_ = uc if m/[a-c]/; 
} 
# list now contains: qw(A B C d);

在閱讀所有這些功能，perldoc perldata中的切片以及perldoc perlsyn中的循環。

來源

2010-02-03 19:55:03 Ether

爲什麼不能做到這一點：

my @extracted = map extract_data($_), 
       grep msg_rcpt_to($rcpt, $_), @log_data;

當你完成，你必須提取的數據在它出現在日誌中的順序相同的數組。

來源

2010-02-03 20:00:18 daotoad

如何在我處理後從Perl數組中刪除元素？

回答

相關問題