2011-08-24 33 views
-4

我有這樣一個清單:要比較兩個列表,找到相似之處

C 
E 

我想找到這些在下表(表1),並寫入到第二個表(表2)

有沒有人有python或perl腳本來做到這一點?

表1:

A MU_ADO_2 1099 MU_ADO_2.1099 o o o o o o o o o o 7.82436 s_3_merged Suseptible A AG 2 4 0 2 0                    
A MU_ADO_2 1105 MU_ADO_2.1105 327.008 s_2_merged Resistance G GT 81 0 2 132 79 31.5281 s_6_merged Resistance G GT 8 0 1 8 7 34.9813 s_3_merged Suseptible G GT 7 0 0 3 7 7.82436 s_7_merged Suseptible G GT 2 0 0 4 2 
A MU_ADO_2 1110 MU_ADO_2.1110 515.963 s_2_merged Resistance A AT 113 96 1 2 110 31.5281 s_6_merged Resistance A AT 7 8 0 0 7 16.3388 s_3_merged Suseptible A AT 4 7 0 0 4 13.808 s_7_merged Suseptible A AT 3 3 0 0 3 
A MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
B MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
B MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
B MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
E MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
E MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
E MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
E MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
F MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
F MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
F MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
F MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
F MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 

表2:

C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
E MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
E MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
E MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
E MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
+1

你嘗試過什麼至今? 「C E」是什麼意思?你想找什麼? – daveydave400

+0

現在你的表格被編輯了(謝謝F.J)我唯一的問題是你到目前爲止嘗試了什麼? – daveydave400

回答

1

替代,在python:

keys = ['C', 'E'] 
with open('out.txt', 'a') as out: 
    with open('test.txt') as f: 
     for line in f: 
      for key in keys: 
       if line.startswith(key): 
        out.write(line) 
        break 

test.txt是一個包含您的表1的文件,複製粘貼。
out.txt是你得到你的表的文件2

+0

你需要在你的循環中的'write'之後有一個'break',以便使它更有效率,或者相當於Python 2.7中的兩行 - 「open('out.txt','a')out,open 'test.txt')as f:'then'out.writelines(line for line in f if(line.startswith(key)for keys in keys))' – agf

+0

@agf,我包含了一個break。對於其他人來說,我更願意讓代碼儘可能簡單,而這些代碼似乎是SO中的新手。 – joaquin

+0

是的,我不是真的推薦高爾夫版本,如果是好的話,打破循環+。 – agf

1

如果你的問題是:「如何可以的,如果過濾該文件只看到第一場等於CE條目? 「

那麼下面應該工作:

awk '$1 ~ /[CE]/ { print $0 }' yourfile > outfile 

如果你想在清晰度爲代價節省一些按鍵,以下也適用:

awk '$1 ~ /[CE]/' yourfile > outfile 
+0

它需要Perl中的所有三個字符。所以呢?您還擁有無限更好的正則表達式 - 以及真正的™編程語言。還要注意,你的代碼並沒有做你說的那樣。哎呦! – tchrist

+0

@tchrist放鬆,Perl比awk好,我不是想要開始一場聖戰,我會刪除讓你不高興的評論。但是,據我所知,這是有效的,讓我知道你發現了什麼錯誤。 –

+0

評論只是挑釁的一切。但是,您的代碼會檢測第一個字段是否包含C或E,這與說'$ 1 =='C「||有很大區別。 $ 1 ==「E」',這就是你的「第一個字段等於'C'或'E'」所說的。我並沒有對正確性做出判斷,只是指出代碼描述與代碼所做的不一致。一個Perl解決方案是'perl -ne'/^[CE]/&& print'',儘管我更喜歡'print if/^ [CE] /'更可讀。 – tchrist

3

由於包含的是標籤我」假設您對其他* nix實用程序開放,這裏是一個sed解決方案:

sed '/^[^CE]/d' table1.txt > table2.txt 

這將刪除從table1.txt所有行不使用C或E

0

假設 「CE」 名單開始來自於一個文件:

awk ' 
    FILENAME == ARGV[1] {list[$1]; next} 
    $1 in list {print} 
' list.txt table1 > table2 
3

如何的grep

grep -e '^[CE]' source.file 

,你可以重定向到一個新的文件,以及:

grep -e '^[CE]' source.file > dest.file 
+0

乾淨簡單! – flies

+0

不錯!從'awk'到'sed'到'grep'的過程不斷導致更簡單的答案。 –