>sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB PE=3 SV=3
MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS
WRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFY
>sp|P62258|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1
MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASW
RIISSIEQKEENKGGEDKLKMIREYRQMVETELKLICCDILDVLDKHLIPAANTGESKVF
預期輸出:
SP | P62258 | 1433E_HUMAN 14-3-3蛋白εOS =智人GN = YWHAE PE = 1 SV = 1 MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASW RIISSIEQKEENKGGEDKLKMIREYRQMVETELKLICCDILDVLDKHLIPAANTGESKVF
到目前爲止寫下的代碼:
#!/usr/bin/python
import re
fh = open("test_seq")
for line in fh:
if line.startswith('>'):
if re.search('PE=1',line):
print line
你要什麼與一般的輸入數據呢?舉一個例子不說。 – zegkljan 2014-11-05 07:01:30
我想分析數據..所以輸出文件包含標題和序列。我只能用我的代碼獲取標題行。謝謝 – user3690643 2014-11-05 07:38:01