2016-12-06 70 views
2

我有加載爲數據幀到R.基因組一個bed file座標,看起來很喜歡這樣的:合併一些行到一個當數據是連續

chrom start end 
chrX 400 600 
chrX 800 1000 
chrX 1000 1200 
chrX 1200 1400 
chrX 1600 1800 
chrX 2000 2200 
chrX 2200 2400 

有沒有必要把所有的行它會更好地壓縮它到這樣的事情:

chrom start end 
chrX 400 600 
chrX 800 1400 
chrX 1600 1800 
chrX 2000 2400 

我怎麼可能做到這一點?

我試過想用dplyr但是沒有成功。 group_by將無法​​正常工作,因爲我不知道如何使用第一行的開始座標和最後一行的結束座標將連續行的塊修改爲一個,因爲這些塊中有很多。

回答

2

使用Bioconductor的從包GenomicRanges,特別是睡覺的文件建立和類似:

library(GenomicRanges) 

# Example data 
gr <- GRanges(
    seqnames = Rle("chr1", 6), 
    ranges = IRanges(start = c(400 ,800, 1200, 1400, 1800, 2000), 
        end = c(600, 1000, 1400, 1600, 2000, 2200))) 
gr 
# GRanges object with 6 ranges and 0 metadata columns: 
#  seqnames  ranges strand 
#   <Rle> <IRanges> <Rle> 
# [1]  chr1 [ 400, 600]  * 
# [2]  chr1 [ 800, 1000]  * 
# [3]  chr1 [1200, 1400]  * 
# [4]  chr1 [1400, 1600]  * 
# [5]  chr1 [1800, 2000]  * 
# [6]  chr1 [2000, 2200]  * 
# ------- 
# seqinfo: 1 sequence from an unspecified genome; no seqlengths 

# merge contiouse ranges into one using reduce: 
reduce(gr) 
# GRanges object with 4 ranges and 0 metadata columns: 
#  seqnames  ranges strand 
#   <Rle> <IRanges> <Rle> 
# [1]  chr1 [ 400, 600]  * 
# [2]  chr1 [ 800, 1000]  * 
# [3]  chr1 [1200, 1600]  * 
# [4]  chr1 [1800, 2200]  * 
# ------- 
# seqinfo: 1 sequence from an unspecified genome; no seqlength 

# EDIT: if the bed file is a data.frame we can convert it to ranges object: 
gr <- GRanges(seqnames(Rle(df$chrom), 
         ranges = IRanges(start = df$start, 
             end = df$end))) 
相關問題