2010-07-30 53 views
0

我有一個包含它看起來像這樣的一些用戶數據的CSV文件中只有一列文件之間的差異:如何找到一個CSV文件,包含該CSV

"10333","","an.10","Kenyata","","Aaron","","","","","","","","","","" 
"12222","","an.4","Wendy","","Aaron","","","","","","","","","","" 
"14343","","aaron.5","Nanci","","Aaron","","","","","","","","","","" 

我也有有一個文件像這樣的每一行上的項目:

an.10 
arron.5 

我想要的只是找到列表文件中包含的CSV文件中的行。

因此所需的輸出將是:

"10333","","an.10","Kenyata","","Aaron","","","","","","","","","","" 
"14343","","aaron.5","Nanci","","Aaron","","","","","","","","","","" 

(注an.4是如何不包含在這個新的列表。)

我有任何可用的環境,對我來說,我願意去嘗試只是除了手動操作之外的任何東西,因爲此csv包含數百萬條記錄,並且列表中大約有10萬個條目。

+0

看看我的自由和開放源碼工具CSVfix在http://code.google.com/p/csvfix/ - 特別是加入命令。 – 2010-07-30 19:01:04

+0

哪個操作系統?你有Excel嗎?你想要一個編程解決方案嗎?你有像grep這樣的工具嗎? – Frank 2010-07-30 19:02:08

+0

我運行的是fedora 12,並擁有Linux機箱,也是windows虛擬機。 grep,sed,diff,都可用。我更喜歡CLI解決方案,但對Perl或其他任何東西都是開放的。 – Chris 2010-07-30 19:05:28

回答

1

標識符an.10等有多獨特?

也許非常小* X的shell腳本就足夠了:

for i in $(uniq list.txt); do grep "\"$i\"" data.csv; done 

那會,在列表中每個唯一項目,返回所有匹配的行csv文件。然而它並不完全匹配第二列。

+0

它們是獨一無二的。 :-) – Chris 2010-07-30 19:03:51

+1

在文件名的選擇上有什麼可怕的巧合!但是你的代碼不會工作,$我只有一個值「list.txt」。 – 2010-07-30 19:10:13

+0

確實。我立場糾正。 :) – relet 2010-07-30 19:13:31

1

如果CSV文件data.csv和列表文件LIST.TXT(這可能與awk例如做),我這樣做:

for i in `cat list.txt`; do grep $i data.csv; done 
+0

即時結束與重複,但? – Chris 2010-07-30 19:19:02

+0

那麼你的清單中有重複嗎?如果您想要快速修復這些問題,請通過'| uniq'管道您的列表或結果 – relet 2010-07-30 19:24:50

相關問題