如何讀取R中的vcf文件

我有這個.vcf格式的文件，我想在R中讀取這個文件。但是，這個文件包含了一些我想跳過的冗餘行。我想在結果中找到類似#CHROM匹配行的行。如何讀取R中的vcf文件

這是我曾嘗試：

chromo1<-try(scan(myfile.vcf,what=character(),n=5000,sep="\n",skip=0,fill=TRUE,na.strings="",quote="\"")) ## find the start of the vcf file 
skip.lines<-grep("^#CHROM",chromo1) 


column.labels<-read.delim(myfile.vcf,header=F,nrows=1,skip=(skip.lines-1),sep="\t",fill=TRUE,stringsAsFactors=FALSE,na.strings="",quote="\"") 
num.vars<-dim(column.labels)[2]

myfile.vcf

#not wanted line 
    #unnecessary line 
    #junk line 
    #CHROM POS  ID  REF  ALT 
    11  33443 3  A  T 
    12  33445 5  A  G

結果

#CHROM POS  ID  REF  ALT 
    11  33443 3  A  T 
    12  33445 5  A  G

來源

2015-09-11 MAPK

如何使用測序包？有幾個，如果谷歌「閱讀vcf R」 –

Bioconductor有幾個VCF閱讀器。 – hrbrmstr

@RichardScriven vcfreader不適合我的情況。我只想跳過這些行並獲得製表符分隔表。 – MAPK

這也許能爲你很好：

# read two times the vcf file, first for the columns names, second for the data 
tmp.vcf<-readLines("test.vcf") 
tmp.vcf.data<-read.table("test.vcf") 

# filter for the columns names 
tmp.vcf<-tmp.vcf[-(grep("#CHROM",tmp.vcf)+1):-(length(tmp.vcf))] 
vcf.names<-unlist(strsplit(tmp.vcf[length(tmp.vcf)],"\t")) 
names(tmp.vcf.data)<-vcf.names

p.s .:如果你有幾個vcf文件，那麼你應該使用lapply函數。

最好，羅伯特

來源

2015-09-11 09:14:33

如何讀取R中的vcf文件

回答

相關問題