殼：從FILE1獲取一行內容FILE2

我有一個文件（maillog中）是這樣的：殼：從FILE1獲取一行內容FILE2

Feb 22 23:53:39 info postfix[102]: connect from APVLDPDF01[... 
    Feb 22 23:53:39 info postfix[101]: BA1D7805A1: client=APVLDPDF01[... 
    Feb 22 23:53:39 info postfix[103]: BA1D7805A1: message-id 
    Feb 22 23:53:39 info opendkim[139]: BA1D7805A1: DKIM-Signature field added 
    Feb 22 23:53:39 info postfix[763]: ED6F3805B9: to=<[email protected]>, relay... 
    Feb 22 23:53:39 info postfix[348]: ED6F3805B9: removed 
    Feb 22 23:53:39 info postfix[348]: BA1D7805A1: from=<[email protected]>,... 
    Feb 22 23:53:39 info postfix[102]: disconnect from APVLDPDF01... 
    Feb 22 23:53:39 info postfix[842]: 59AE0805B4: to=<[email protected]>,status=sent 
    Feb 22 23:53:39 info postfix[348]: 59AE0805B4: removed 
    Feb 22 23:53:41 info postfix[918]: BA1D7805A1: to=<[email protected]>, status=sent 
    Feb 22 23:53:41 info postfix[348]: BA1D7805A1: removed

和第二文件（mailids）是這樣的：

6DBDD8039F: 
    3B15BC803B: 
    BA1D7805A1: 
    2BD19803B4:

我想得到一個包含如下內容的輸出文件：

Feb 22 23:53:41 info postfix[918]: BA1D7805A1: to=<[email protected]>, status=sent

只是ID在第二個文件中存在的行，在本例中只是ID = BA1D7805A1：在文件一中。但是還有另一個條件，這條線必須是「ID = <」這意味着只有包含「to = <」的行和文件2中的ID纔可以輸出。

我發現不同的解決方案，但我有一個關於性能的巨大問題。 maillog文件大小爲2GB，大約10萬行。 mailid文件大約有32000行。

這個過程需要太多時間，我從來沒有見過。我試着用awk和grep命令，但我沒有找到最好的方法。

來源

2014-04-16 Helparod

具體什麼你嘗試過這麼遠嗎？ – Mark

grep -F -f mailids maillog | grep 'to=<'

從grep手冊頁：

-F, --fixed-strings 
      Interpret PATTERN as a list of fixed strings, separated by 
      newlines, any of which is to be matched. (-F is specified by 
      POSIX.) 

    -f FILE, --file=FILE 
      Obtain patterns from FILE, one per line. The empty file 
      contains zero patterns, and therefore matches nothing. (-f is 
      specified by POSIX.)

來源

2014-04-16 22:16:22

我想知道第二個grep可能會更快，也就是尋找'to = <'的那個，因爲這樣你就可以避免需要比較32,000個mailids的行，如果它不包含'to = <」。我可能是錯的，這只是一個想法，我沒有做任何基準測試。 –

@MarkSetchell我也在想。 'grep'to = <'maillog | grep -F -f mailids' –

此外，可能值得對maillog中的行進行計數，並使用「split」將其分成4或8個相等部分，並在背景中並行運行4或8個建議命令副本最後用「等待」來獲得OP的其他3個或7個內核做一些有用的事情！ –

最好添加-w選項

-w, --word-regexp 
      Select only those lines containing matches that form whole 
      words. The test is that the matching substring must either be 
      at the beginning of the line, or preceded by a non-word 
      constituent character. Similarly, it must be either at the end 
      of the line or followed by a non-word constituent character. 
      Word-constituent characters are letters, digits, and the 
      underscore.

下面是常見的命令我使用。

grep -Fwf mailids maillog |grep 'to=<'

，如果ID被固定在第6列，試試這個班輪awk命令

awk 'NR==FNR{a[$1];next} /to=</&&$6 in a ' mailids maillog

來源

2014-04-17 04:29:34 BMW

殼：從FILE1獲取一行內容FILE2

回答

相關問題