2016-01-13 77 views
1

來自一個範圍在另一表中的兩列中的一個表中選擇行我需要將線從「地圖」僅保存時,行是從「REF」表的間隔:中,使用R

按照例如在「地圖」表:

map<-"chr start tag depth BCV State 
chr1 1 chr1-1 1 2 1 
chr1 2 chr1-2 1 3 2 
chr1 3 chr1-3 1 2 3 
chr1 4 chr1-4 2 2 4 
chr2 5 chr2-5 2 2 5 
chr2 1 chr2-1 2 2 6 
chr2 2 chr2-2 3 2 4 
chr2 3 chr2-3 3 2 3 
chr2 4 chr2-4 3 2 2 
chr2 5 chr2-5 3 2 1 
chr2 6 chr2-6 3 2 7 
chr2 7 chr2-7 3 2 9 
chr2 8 chr2-8 2 2 2 
chr2 9 chr2-9 2 2 1" 
map<-read.table(text=map,header=T) 

而且我有一個參考圖這樣的例子:

ref<-"chr start end 
chr1 1 2 
chr1 2 3 
chr1 5 6 
chr2 7 9" 
ref<-read.table(text=ref,header=T) 

我需要一個最後的表是這樣的:

final<-"chr start tag depth BCV State 
chr1 1 chr1-1 1 2 1 
chr1 2 chr1-2 1 3 2 
chr1 3 chr1-3 1 2 3 
chr2 7 chr2-7 3 2 9 
chr2 8 chr2-8 2 2 2 
chr2 9 chr2-9 2 2 1" 
final<-read.table(text=final,header=T) 

回答

4

,因爲這是標有data.table標籤,這裏有一個簡單的data.table::forverlaps解決方案

setDT(map)[, end := start] 
setkey(setDT(ref)) 
indx <- unique(foverlaps(map, ref, which = TRUE, nomatch = 0L)$xid) 
map[indx] 
#  chr start tag depth BCV State end 
# 1: chr1  1 chr1-1  1 2  1 1 
# 2: chr1  2 chr1-2  1 3  2 2 
# 3: chr1  3 chr1-3  1 2  3 3 
# 4: chr2  7 chr2-7  3 2  9 7 
# 5: chr2  8 chr2-8  2 2  2 8 
# 6: chr2  9 chr2-9  2 2  1 9 

這基本上是增加了一個endmap以關閉的時間間隔,以定義設置keyref數據匹配間隔爲foverlaps,而chr也包括在內。然後運行foverlaps,同時刪除不匹配的值並選擇重疊,以防ref中的間隔重疊。最後,根據索引僅對子集map

2

首先,你需要擴大間隔:

L <- lapply(split(ref,ref$chr), function(d) unique(unlist(mapply(seq,d$start,d$end,SIMPLIFY = F)))) 

,這將給你:

#$chr1 
#[1] 1 2 3 5 6 

#$chr2 
#[1] 7 8 9 

然後你就可以合併:

ref2 <- setNames(stack(L),c('start','chr')) 
merge(map,ref2) 

最終輸出:

# chr start tag depth BCV State 
#1 chr1  1 chr1-1  1 2  1 
#2 chr1  2 chr1-2  1 3  2 
#3 chr1  3 chr1-3  1 2  3 
#4 chr2  7 chr2-7  3 2  9 
#5 chr2  8 chr2-8  2 2  2 
#6 chr2  9 chr2-9  2 2  1