GenomicRanges添加覆蓋

我工作的RNA序列數據，並試圖通過基因型繪製平均覆蓋率曲線，類似於什麼都是在這裏完成每基因型GenomicRanges添加覆蓋

RNA測序覆蓋率（來源：pickrell等，Nature，2010 ）

要做到這一點的情節，我已經從100個個人要人文件，包含從RNA-seq的數據覆蓋信息（在特定區域），並且我在R讀取，如GenomicRanges對象。

這給了我GRANGES對象，如在下面的實施例玩具得到的那些：

GR1 = GRANGES（seqname的= 1，範圍= IRanges（開始= C（1,5,10,15,30 ，55），端= C（4,9,14,29,39,60）））

GR1 $ COV = C（3,1,8,6,2,10）

GR2 = GRANGES（seqname的= 1，範圍= IRanges（開始= C（3,20,24），結束= C（7,23,26）））

GR2 $ COV = C（3,5,3）

開始=唯一的（排序（C（範圍（GR1）@開始，範圍（GR2）@啓動）））

GR1

GRanges object with 6 ranges and 1 metadata column: 
seqnames ranges strand |  cov 
    <Rle> <IRanges> <Rle> | <numeric> 
     1 [ 1, 4]  * |   3 
     1 [ 5, 9]  * |   1 
     1 [10, 14]  * |   8 
     1 [15, 29]  * |   6 
     1 [30, 39]  * |   2 
     1 [55, 60]  * |  10 
     ------- 
seqinfo: 1 sequence from an unspecified genome; no seqlengths

GR2

GRanges object with 3 ranges and 1 metadata column: 
seqnames ranges strand |  cov 
    <Rle> <IRanges> <Rle> | <numeric> 
     1 [ 3, 7]  * |   3 
     1 [20, 23]  * |   5 
     1 [24, 26]  * |   3 
     ------- 
seqinfo: 1 sequence from an unspecified genome; no seqlengths

問題是，我有這些每個人（gr1和gr2將2個不同的個人），我想結合他們創造基因組範圍的對象，它給我的全覆蓋，在不同個體，1和2 會看，因爲每個位置如下：

GR3

GRanges object with 6 ranges and 1 metadata column: 
seqnames ranges strand |  cov 
    <Rle> <IRanges> <Rle> | <numeric> 
     1 [ 1, 2]  * |   3 
     1 [ 3, 4]  * |   6 (=3+3) 
     1 [ 5, 7]  * |   4 (=1+3) 
     1 [ 8, 9]  * |   1 
     1 [10, 14]  * |   8 
     1 [15, 19]  * |   6 
     1 [20, 23]  * |   11 (=6+5) 
     1 [24, 26]  * |   9 (=6+3) 
     1 [27, 29]  * |   6 
     1 [30, 39]  * |   2 
     1 [55, 60]  * |  10

有誰知道一個簡單的方法這個？或者我註定了？

感謝您的回答。

PS：我的數據並沒有擱淺，但如果你有它的滯留數據，它會更好。

PPS：理想情況下，我還希望能夠進行乘法運算，或者應用帶有兩個參數x和y的任何函數，而不是簡單地添加覆蓋範圍。

來源

2017-02-22 DaffyDoc

已經差不多一年了，但這是我未來參考的答案。

每當我找不到一個函數來直接執行此任務時，我只需將GRanges對象展開爲單bp分辨率。這使我可以在元數據列上執行任何所需的操作，將其視爲簡單的data.frame列，因爲IRanges現在在兩個對象之間相匹配。

在這個問題上的特殊情況下，以下的作品。

### Sort seqlevels 
# (not necessary here, but in real world examples, 
# with multiple sequences, you will want to do this) 
gr1 <- sort(GenomeInfoDb::sortSeqlevels(gr1)) 
gr2 <- sort(GenomeInfoDb::sortSeqlevels(gr2)) 

### Add seqlengths 
# (this corresponds to the actual sequence lengths; 
# here we use the highest position between the two objects: 60) 
seqlengths(gr1) <- 60 

### Make 1-bp tiles covering the genome 
# (using either one of gr1 and gr2 as a reference) 
bins <- GenomicRanges::tileGenome(GenomeInfoDb::seqlengths(gr1), 
            tilewidth=1, 
            cut.last.tile.in.chrom=TRUE) 

### Get coverage signal as Rle object 
gr1_cov <- coverage(gr1, weight="cov") 
gr2_cov <- coverage(gr2, weight="cov") 

### Get average coverage in each bin 
# (since the bins are 1-bp wide, this just keeps the original coverage value) 
gr1_bins <- GenomicRanges::binnedAverage(bins, gr1_cov, "binned_cov") 
gr2_bins <- GenomicRanges::binnedAverage(bins, gr2_cov, "binned_cov") 

### Make final object: 
# We can now sum the values in the metadata columns 
# Addressing the PPS, you could do any other operation or apply a function 
gr3 <- gr1_bins 
gr3$binned_cov <- gr1_bins$binned_cov + gr2_bins$binned_cov

這產生最終GRanges對象在單鹼基分辨率。

> gr3 

GRanges object with 60 ranges and 1 metadata column: 
    seqnames ranges strand | binned_cov 
     <Rle> <IRanges> <Rle> | <numeric> 
[1]  1 [1, 1]  * |   3 
[2]  1 [2, 2]  * |   3 
[3]  1 [3, 3]  * |   6 
[4]  1 [4, 4]  * |   6 
[5]  1 [5, 5]  * |   4 
...  ...  ... ... .  ... 
[56]  1 [56, 56]  * |   10 
[57]  1 [57, 57]  * |   10 
[58]  1 [58, 58]  * |   10 
[59]  1 [59, 59]  * |   10 
[60]  1 [60, 60]  * |   10 
------- 
seqinfo: 1 sequence from an unspecified genome

壓縮它，並在問題得到確切gr3，我們可以做到以下幾點。

### Compress back to variable-width IRanges (by cov) 
gr3_Rle <- coverage(gr3, weight='binned_cov') 
gr3 <- as(gr3_Rle, "GRanges") 

### Drop 0-score rows 
gr3 <- gr3[gr3$score > 0] 

### Rename metadata column 
names(mcols(gr3)) <- 'cov' 

> gr3 

GRanges object with 11 ranges and 1 metadata column: 
     seqnames ranges strand |  cov 
      <Rle> <IRanges> <Rle> | <numeric> 
    [1]  1 [ 1, 2]  * |   3 
    [2]  1 [ 3, 4]  * |   6 
    [3]  1 [ 5, 7]  * |   4 
    [4]  1 [ 8, 9]  * |   1 
    [5]  1 [10, 14]  * |   8 
    [6]  1 [15, 19]  * |   6 
    [7]  1 [20, 23]  * |  11 
    [8]  1 [24, 26]  * |   9 
    [9]  1 [27, 29]  * |   6 
    [10]  1 [30, 39]  * |   2 
    [11]  1 [55, 60]  * |  10 
    ------- 
    seqinfo: 1 sequence from an unspecified genome

來源

2018-01-09 21:49:08 Luis

GenomicRanges添加覆蓋

回答

相關問題