在兩個大文件中查找匹配的ID

文件1擁有160萬行有以下格式：id:email

文件2擁有45萬行這種格式：id:hash

的問題是找出所有等於ID S和保存這些到第三個文件，格式爲：email:hash

嘗試類似：

awk -F':' 'NR==FNR{a[$1]=$2;next} {print a[$1]":"$2}' test1.in test2.in > res.in

但它不工作:(

例文件1：

9305718:[email protected] 
59287478:[email protected]

文件2：

21367509:e90100b1b668142ad33e58c17a614696ec04474c 
9305718:d63fff1d21e1a04c066824dd2f83f3aeaa0edf6e

期望的結果：

[email protected]:d63fff1d21e1a04c066824dd2f83f3aeaa0edf6e

來源

2016-08-02 efewfewf wefewf

160米記錄可能不適合記憶。這些文件是按ID排序的嗎？如果是這樣，'join'是這項任務的更好工具。 – karakfa

是的，它被分類。但並非所有的ID都在第二檔，這不是問題嗎？ –

示例* file2 *數據未被排序。它應該是？ – agc

隨着GNU加入和GNU的bash：

join -t : -j 1 <(sort -t : -k1,1 file1) <(sort -t : -k1,1 file2) -o 1.2,2.2

更新：

join -t: <(sort file1) <(sort file2) -o 1.2,2.2

來源

2016-08-02 17:31:06 Cyrus

@karakfa;謝謝。我已經更新了我的答案。 – Cyrus

nji $ join -t：<（sort test1.in）<（sort test2.in）-o 1.2,2.2 usage：join [-a fileno | -v fileno] [-e string] [-1 field] [-2 field] [-o list] [-t char] file1 file2 –

Trying： join -t：-o 1.2,2.2 <（sort test1。在）<（排序test2.in）：d63fff1d21e1a04c066824dd2f83f3aeaa0edf6e ：361e7976e4b783517aca819caf1322c2e0b8cd32 –

在AWK（不考慮資源量有可用）：

$ awk -F':' 'NR==FNR{a[$1]=$2;next} a[$1] {print a[$1]":"$2}' test1.in test2.in 
[email protected] :d63fff1d21e1a04c066824dd2f83f3aeaa0edf6e

來源

2016-08-02 20:35:56

在兩個大文件中查找匹配的ID

回答

相關問題