1
我正在用mm10分析鼠標的RNAseq數據。我從tophat2/bowtie開始。然後我跑袖釦生成基因和異構體的FPKM。我正在使用一個gtf文件,它在第二列以及與基因名稱相鄰的行內具有基因生物型(即是否爲假基因,蛋白質編碼,snRNA,lincRna等)。行我GTF的例子是:如何自定義袖釦輸出欄
1 unprocessed_pseudogene exon 3054233 3054733 . + . exon_id "ENSMUSE00000848981"; exon_number "1"; gene_biotype "pseudogene"; gene_id "ENSMUSG00000090025"; gene_name "Gm16088"; gene_source "havana"; tag "mRNA_start_NF"; transcript_id "ENSMUST00000160944"; transcript_name "Gm16088-001"; transcript_source "havana"; tss_id "TSS82763"; 1 unprocessed_pseudogene transcript 3054233 3054733 . + . gene_biotype "pseudogene"; gene_id "ENSMUSG00000090025"; gene_name "Gm16088"; gene_source "havana"; tag "mRNA_start_NF"; transcript_id "ENSMUST00000160944"; transcript_name "Gm16088-001"; transcript_source "havana"; tss_id "TSS82763"; 1 snRNA exon 3102016 3102125 . + . exon_id "ENSMUSE00000522066"; exon_number "1"; gene_biotype "snRNA"; gene_id "ENSMUSG00000064842"; gene_name "Gm26206"; gene_source "ensembl"; transcript_id "ENSMUST00000082908"; transcript_name "Gm26206-201"; transcript_source "ensembl"; tss_id "TSS81070"; 1 snRNA transcript 3102016 3102125 . + . gene_biotype "snRNA"; gene_id "ENSMUSG00000064842"; gene_name "Gm26206"; gene_source "ensembl"; transcript_id "ENSMUST00000082908"; transcript_name "Gm26206-201"; transcript_source "ensembl"; tss_id "TSS81070";
我的袖釦基因和亞型跟蹤輸出文件看起來是這樣的:
tracking_id class_code nearest_ref_id gene_id gene_short_name tss_id locus length coverage FPKM FPKM_conf_lo FPKM_conf_hi FPKM_status ENSMUSG00000090025 - - ENSMUSG00000090025 Gm16088 TSS82763 1:3054232-3054733 - - 0 0 0 OK ENSMUSG00000064842 - - ENSMUSG00000064842 Gm26206 TSS81070 1:3102015-3102125 - - 0 0 0 OK ENSMUSG00000025900 - - ENSMUSG00000025900 Rp1 TSS11475 1:4343506-4360314 - - 0 0 0 OK ENSMUSG00000088333 - - ENSMUSG00000088333 Gm22848 TSS18078 1:3783875-3783933 - - 0 0 0 OK ENSMUSG00000025902 - - ENSMUSG00000025902 Sox17 TSS56047,TSS74369 1:4490927-4496413 - - 0.611985 0.394887 0.829082 OK ENSMUSG00000051951 - - ENSMUSG00000051951 Xkr4 TSS1201,TSS70682,TSS88403 1:3205900-3671498 - - 0 0 0 OK
正如你所看到的,它缺乏指示GTF的第二列基因產物的類型。反正有袖釦自動將其納入其輸出文件?似乎沒有一個簡單的命令,除非我錯過了它。請告知 -