0
我不得不在其上含有數據的巨大的文本文件的工作如由space.It分割塊如下:的Python:讀取在塊文本文件時每個組塊的大小是未知
>3D_helix;140
protein_name:AChR pore alpha subunit (Torpedo marmorata)
file_name:ACh_pore_alpha.txt
entry_date:3july03
refman_number:21022
endnote_number:
author:Miyazawa,A., Fujiyoshi,Y., Unwin,N.(2003) [Structure and gating mechanism of the acetylcholine receptor pore] {Nature, 423, 949-955}
remarks:Sequence is from PDB, chain A. There is additional 24 AA as signal sequence in Swiss-Prot. TMhelices=4.
pir_number:
Swiss_Prot_entry:ACHA_TORMA
Swiss_Prot_number:P02711
Swiss_Prot_gene:CHRNA1
Swiss_Prot_name:Acetylcholine receptor subunit alpha
PDB_title:Acetylcholine Receptor Protein, alpha Chain
PDB_Identifier:1OED
N_terminal:in
number_tmsegs:4
tm_segments:A.211,237;B.243,271;C.275,300;D.403,436
sequence:SEHETRLVANLLENYNKVIRPVEHHTHFVDITVGLQLIQLINVDEVNQIVETNVRLRQQWIDVRLRWNPADYGGIKKIRLPSDDVWLPDLVLYNNADGDFAIVHMTKLLLDYTGKIMWTPPAIFKSYCEIIVTHFPFDQQNCTMKLGIWTYDGTKVSISPESDRPDLSTFMESGEWVMKDYRGWKHWVYYTCCPDTPYLDITYHFIMQRIPLYFVVNVIIPCLLFSFLTVLVFYLPTDSGEKMTLSISVLLSLTVFLLVIVELIPSTSSAVPLIGKYMLFTMIFVISSIIVTVVVINTHHRSPSTHTMPQWVRKIFINTIPNVMFFSTMKRASKEKQENKIFADDIDISDISGKQVTGEVIFQTPLIKNPDVKSAIEGVKYIAEHMKSDEESSNAAEEWKYVAMVIDHILLCVFMLICIIGTVSVFAGRLIELSQEG*
>1D_helix;141
protein_name:AChR pore beta subunit (Torpedo marmorata)
file_name:ACh_pore_beta.txt
entry_date:3july03
refman_number:21022
endnote_number:
author:Miyazawa,A., Fujiyoshi,Y., Unwin,N.(2003) [Structure and gating mechanism of the acetylcholine receptor pore] {Nature, 423, 949-955}
remarks:Sequence is from PDB, chain B. There is additional 24 AA as signal sequence in Swiss-Prot. TMhelices=4.
pir_number:
Swiss_Prot_entry:Q6S3I0_TORMA
Swiss_Prot_number:Q6S3I0
Swiss_Prot_gene:none
Swiss_Prot_name:Acetylcholine receptor beta subunit
PDB_title:Acetylcholine Receptor Protein, beta Chain
PDB_Identifier:1OED
N_terminal:in
number_tmsegs:4
tm_segments:A.224,241;B.249,274;C.290,306;D.438,462
sequence:SVMEDTLLSVLFENYNPKVRPSQTVGDKVTVRVGLTLTSLLILNEKNEEMTTSVFLNLAWTDYRLQWDPAAYEGIKDLSIPSDDVWQPDIVLMNNNDGSFEITLHVNVLVQHTGAVSWHPSAIYRSSCTIKVMYFPFDWQNCTMVFKSYTYDTSEVILQHALDAKGEREVKEIMINQDAFTENGQWSIEHKPSRKNWRSDDPSYEDVTFYLIIQRKPLFYIVYTIVPCILISILAILVFYLPPDAGEKMSLSISALLALTVFLLLLADKVPETSLSVPIIISYLMFIMILVAFSVILSVVVLNLHHRSPNTHTMPNWIRQIFIETLPPFLWIQRPVTTPSPDSKPTIISRANDEYFIRKPAGDFVCPVDNARVAVQPERLFSEMKWHLNGLTQPVTLPQDLKEAVEAIKYIAEQLESASEFDDLKKDWQYVAMVADRLFLYIFITMCSIGTFSIFLDASHNVPPDNPFA*
>3D_other;143
protein_name:AChR pore delta subunit (Torpedo marmorata)
file_name:ACh_pore_delta.txt
entry_date:4dec03
refman_number:21022
endnote_number:
author:Miyazawa,A., Fujiyoshi,Y., Unwin,N.(2003) [Structure and gating mechanism of the acetylcholine receptor pore] {Nature, 423, 949-955}
remarks:Sequence is from PDB, chain C. Sequence in PDB has first 21 AA removed relative to Swiss-Prot. TMhelices=4.
pir_number:
Swiss_Prot_entry:Q6S3H8_TORMA
Swiss_Prot_number:Q6S3H8
Swiss_Prot_gene:none
Swiss_Prot_name:Acetylcholine receptor delta subunit
PDB_title:Acetylcholine Receptor Protein, delta Chain
PDB_Identifier:1OED
N_terminal:in
number_tmsegs:4
tm_segments:A.226,253;B.257,285;C.289,316;D.452,483
sequence:VNEEERLINDLLIVNKYNKHVRPVKHNNEVVNIALSLTLSNLISLKETDETLTTNVWMDHAWYDHRLTWNASEYSDISILRLRPELIWIPDIVLQNNNDGQYNVAYFCNVLVRPNGYVTWLPPAIFRSSCPINVLYFPFDWQNCSLKFTALNYNANEISMDLMTDTIDGKDYPIEWIIIDPEAFTENGEWEIIHKPAKKNIYGDKFPNGTNYQDVTFYLIIRRKPLFYVINFITPCVLISFLAALAFYLPAESGEKMSTAICVLLAQAVFLLLTSQRLPETALAVPLIGKYLMFIMSLVTGVVVNCGIVLNFHFRTPSTHVLSTRVKQIFLEKLPRILHMSRVDEIEQPDWQNDLKLRRSSSVGYISKAQEYFNIKSRSELMFEKQSERHGLVPRVTPRIGFGNNNENIAASDQLHDEIKSGIDSTNYIVKQIKEKNAYDEEVGNWNLVGQTIDRLSMFIITPVMVLGTIFIFVMGNFNRPPAKPFEGDPFDYSSDHPRCA
每個塊從3個給定選項中的任一個開始。每個塊中的行數是多種多樣的。我想要分割的文件分成3份(或3單獨的文件),使得:
part 1 contains all blocks starting with >3D_Helix
part 2 contains all blocks starting with >1D_helix
part 3 contains all blocks starting with >3d_other
我嘗試以下方法
prot_file = open(sys.argv[1], "r")
flag = False
for line in prot_file:
if line.startswith (">3D_other"):
flag == True
if flag == True:
print line
但它僅打印第一線即3d_helix。我在網上發現的大多數提示都根據每個塊的大小將列表分成塊(即已知大小固定在某個特定的數字,例如13)。但在我的情況下,我不知道大小,因此不能使用它們。我想要一個有效的pythonic方法來按照解釋的方式劃分文件。