0
我有一個GI(genbank標識符)號碼列表。我如何獲得每個GI號碼的序列描述(如'mus musculus hypothetical protein X'),以便我可以將它存儲在一個變量中並將其寫入文件? 感謝您的幫助!如何通過biopython從gi編號獲取序列描述?
我有一個GI(genbank標識符)號碼列表。我如何獲得每個GI號碼的序列描述(如'mus musculus hypothetical protein X'),以便我可以將它存儲在一個變量中並將其寫入文件? 感謝您的幫助!如何通過biopython從gi編號獲取序列描述?
因此,如果任何人有這樣的疑問,這裏是解決方案:
handle=Entrez.esummary(db="nucleotide, protein, ...", id="gi or NCBI_ref number")
record=Entrez.read(handle)
handle.close()
description=record[0]["Title"]
print description
這將打印對應的標識序列描述。
這是我寫的一個腳本,用於爲文件中的每個genbank標識符提取整個GenBank文件。應該很容易爲您的應用程序進行更改。
#This program will open a file containing NCBI sequence indentifiers, find the associated
#information and write the data to *.gb
import os
import sys
from Bio import Entrez
Entrez.email = "yo[email protected]" #Always tell NCBI who you are
try: #checks to make sure input file is in the folder
name = raw_input("\nEnter file name with sequence identifications only: ")
handle = open(name, 'r')
except:
print "File does not exist in folder! Check file name and extension."
quit()
outfile = os.path.splitext(name)[0]+"_GB_Full.gb"
totalhand = open(outfile, 'w')
for line in handle:
line = line.rstrip() #strips \n from file
print line
fetch_handle = Entrez.efetch(db="nucleotide", rettype="gb", retmode="text", id=line)
data = fetch_handle.read()
fetch_handle.close()
totalhand.write(data)
檢查了這一點,讓你開始:http://biopython.org/DIST/docs/api/Bio.Entrez-pysrc.html – heathobrien