我有一個我試圖解析的基因GTF文件,因此'gene_id','gene_type','gene_status','gene_name'和level都在單獨的列中。解析GTF基因文件
因此,對於我的原始文件:
chr1 | ENSEMBL gene| 17369| 17436| . - . |gene_id "ENSG00000278267.1"; gene_type "miRNA"; gene_status "KNOWN"; gene_name "MIR6859-1"; level 3;
chr1 | ENSEMBL gene| 30366| 30503| . + . |gene_id "ENSG00000274890.1"; gene_type "miRNA"; gene_status "KNOWN"; gene_name "MIR1302-2"; level 3;
chr1 | ENSEMBL gene| 157784| 157887| . - . |gene_id "ENSG00000222623.1"; gene_type "snRNA"; gene_status "KNOWN"; gene_name "RNU6-1100P"; level 3;
chr1 | ENSEMBL gene| 187891| 187958| . - . |gene_id "ENSG00000273874.1"; gene_type "miRNA"; gene_status "KNOWN"; gene_name "MIR6859-2"; level 3;
我想它看起來像這樣,以 'gene_id', 'gene_type', 'gene_status', 'gene_name,' 和水平都在單獨的列是:
chr1 |ENSEMBL |gene| 17369| |17436 |. - . |gene_id "ENSG00000278267.1" |gene_type "miRNA" |gene_status "KNOWN" |gene_name "MIR6859-1" |level 3
chr1 |ENSEMBL |gene| 30366| 30503 |. + . |gene_id "ENSG00000274890.1" |gene_type "miRNA" |gene_status "KNOWN" |gene_name "MIR1302-2" |level 3
chr1 |ENSEMBL |gene| 157784| 157887 |. - . |gene_id "ENSG00000222623.1" |gene_type "snRNA" |gene_status "KNOWN" |gene_name "RNU6-1100P" |level 3
chr1 |ENSEMBL |gene| 187891| 187958 |. - . |gene_id "ENSG00000273874.1" |gene_type "miRNA" |gene_status "KNOWN" |gene_name "MIR6859-2" |level 3
我曾嘗試使用gffutils它來解析,使用基本代碼,他們提供:
import gffutils
db = gffutils.create_db("sRNA.gene.gtf", dbfn='sRNA.gene.gtf.db')
print(list(db.featuretypes()))
# Here's how to write genes out to file
with open('sRNA.gene.gtf', 'w') as fout:
for gene in db.features_of_type('gene'):
fout.write(str(gene) + '\n')
然而,我收到一個「導入錯誤:無法導入名稱‘功能:’
ImportError Traceback (most recent call last)
<ipython-input-26-4dd7cd5c7e24> in <module>()
2
3
----> 4 db = gffutils.create_db("sRNA.gene.gtf", dbfn='sRNA.gene.gtf.db')
5
6 #db = gffutils.FeatureDB('sRNA.gene.gtf.db')
我不知道是怎麼回事錯在這裏,現在正在考慮嘗試使用命令行只是解析它。任何人都可以請提供一些建議,以解析GTF文件的最佳方式?
預先感謝您。
請編輯您的問題,包括你所需的輸出,給你的樣品輸入。祝你好運。 – shellter
添加了更改,謝謝! – espop23
很難看到您的輸入和輸出的差異。你可以切換到在列之間使用'|'字符嗎?你是否加載到Excel或類似?祝你好運。 – shellter