我有一個Excel文件這樣的..多個單元格在Excel
Sr. No. GENE ID Gene Id (NCBI) Protein Id Protein Sequences
1 Lmo0001 984365 NP_463534.1
2 Lmo0002 984379 NP_463535.1
3 Lmo0003 984420 NP_463536.1
該列表擴展到3000個基因。我將這些序列保存在這樣的文本板中,對於所有3000個基因,每個單獨序列之間都有一個空格。
gi | 16802049 | ref | NP_463534.1 |染色體複製起始蛋白[李斯特菌EGD-E] MQSIEDIWQETLQIVKKNMSKPSYDTWMKSTTAHSLEGNTFIISAPNNFVRDWLEKSYTQFIANILQEIT GRLFDVRFIDGEQEENFEYTVIKPNPALDEDGIEIGKHMLNPRYVFDTFVIGSGNRFAHAASLAVAEAPA KAYNPLFIYGGVGLGKTHLMHAVGHYVQQHKDNAKVMYLSSEKFTNEFISSIRDNKTEEFRTKYRNVDVL LIDDIQFLAGKEGTQEEFFHTFNTLYDEQKQIIISSDRPPKEIPTLEDRLRSRFEWGLITDITPPDLETR IAILRKKAKADGLDIPNEVMLYIANQIDSNIRELEGALIRVVAYSSLVNKDITAGLAAEALKDIIPSSKS QVITISGIQEAVGEYFHVRLEDFKAKKRTKSIAFPRQIAMYLSRELTDASLPKIGDEFGGRDHTTVIHAH EKISQLLKTDQVLKNDLAEIEKNLRKAQNMF
GI | 16802050 | REF | NP_463535.1 | DNA聚合酶III亞基β[李斯特菌EGD-E] MKFVIERDRLVQAVNEVTRAISARTTIPILTGIKIVVNDEGVTLTGSDSDISIEAFIPLIENDEVIVEVE SFGGIVLQSKYFGDIVRRLPEENVEIEVTSNYQTNISSGQASFTLNGLDPMEYPKLPEVTDGKTIKIPIN VLKNIVRQTVFAVSAIEVRPVLTGVNWIIKENKLSAVATDSHRLALREIPLETDIDEEYNIVIPGKSLSE LNKLLDDASESIEMTLANNQILFKLKDLLFYSRLLEGSYPDTSRLIPTDTKSELVINSKAFLQAIDRASL LARENRNNVIKLMTLENGQVEVSSNSPEVGNVSENVFSQSFTGEEIKISFNGKYMMDALRAFEGDDIQIS FSGTMRPFVLRPKDAANPNEILQLITPVRTY
GI | 16802051 | REF | NP_463536.1 |假定蛋白lmo0003 [單增李斯特菌EGD-E] MMKDMTTGNPTKLIFLFAMPMLIGNLFQQFYTMIDAVIVGKFVSVDALAAVGATNSVNFFMISLIIGLMS GISVVVAQYFGFKDYDRLKDVIATATYAVVFSAIILTVAGVLLAKPLLILLRTPANILDDSTIFLTTLFI GILPMSLYNGMAAILRALGNSITPLIFLILSSLMNIALDFLFVVYMDMGVRGAAIATVLSQTAAAIAVIY YAYRHVPFMRIERAKFKLSTPLLKEMVRIGLPSGLQGSFISIGNMALQSLINGFGSSVVAAYTAASRIDS LTYQPGIAFGAASSMFAGQNIGAGKIDRVREGFWSGIKVVTAISIGITILVQLFARQFLLLFVDSSETEV INIGVSYLLIVSLFYVVVGILFVVRETLRGTGDAMVPLAMGIFELVSRLVIGFVLSLYIGYVGLWWATPV AWITATILGVWRYKSGAWQKKAVIRRK
GI | 16802052 | REF | NP_463537.1 |假定蛋白lmo0004 [單增李斯特菌EGD-E] MAETVKINSEFVTLGQLLQMIDVVSTGGMAKAYLSENTIYINGEQDNRRGKKLRNGDVILVPGVGKVKIE QGK
GI | 16802053 | REF | NP_463538.1 |重組蛋白F [單增李斯特菌EGD-E] MHLESIVLRNFRNYENLELEFSPSVNVFLGENAQGKTNLLEAVLMLALAKSHRTTNDKDFIMWEKEEAKM EGRIAKHGQSVPLELAITQKGKRAKVNHLEQKKLSQYVGNLNVVIFAPEDLSLVKGAPGIRRRFLNMEIG QMQPIYLHNLSEYQRILQQRNQYLKMLQMKRKVDPILLDILTEQFADVAINLTKRRADFIQKLEAYAAPI HHQISRGLETLKIEYKASITLNGDDPEVWKADLLQKMESIKQREIDRGVTLIGPHRDDSLFYINGQNVQD FGSQGQQRTTALSIKLAEIDLIHEETGEYPVLLLDDVLSELDDYRQSHLLGAIEGKVQTFVTTTSTSGID HETLKQATTFYVEKGTVKKS
是否有可能將每個序列中的每一行上的每一個蛋白質序列點,而無需複製和粘貼各手動?任何方法都很好。
P.S我很抱歉這個荒謬的表,但沒有足夠的聲望點,我無法發佈圖片,這是我可以管理的最好的。
@swapnil但我想從記事本中的序列在第一個Excel表格的蛋白質序列列下以直線複製。
只需使用excel打開文本文件,它會問你關於分隔符指定那裏|然後你會得到文件在excel – Swapnil