從R中不同文件的位置重疊位置

我有一個Start和End位置和兩個樣本列（數字）的兩個大文件。從R中不同文件的位置重疊位置

File 1:

Start End Sample1 Sample2 
1   60  1  4 
100  200  2  1 
201  250  1  4 
300  450  1  1

File 2:

Start End Sample1 Sample2 
40   60  1  1 
70  180  1  1 
240  330  2  1 
340  450  1  4 
500  900  1  4 
980  1200  2  1

首先，我想借此從第一個文件時Start和End位置，使段情節。該圖還必須考慮第一個文件中的每個位置的Start-20和End+20。

然後我想從第二文件採取重疊Start和End位置和繪製它的上面圖。以這種方式，將會有許多圖形基於來自第一個文件的Start和End個位置，並且具有不重疊的位置也將被單獨繪製。

的color對於每個段將基於兩個樣品編號（例如，在這兩個文件，如果其1 and 4段的顏色會red，如果其1 and 1段的顏色將是等）。

我會很感激，如果有人讓我明白如何使功能這R.

在此先感謝。

PS我附上了 enter image description here 輸出的圖紙。我只顯示了兩個結果。

下面是我寫的代碼，但它給出了一個錯誤match.names

錯誤（clabs，姓名（十一））：名不曾用名

我也需要匹配指定數據集1線段的紅色和數據集2的線段的綠色至。我將如何在下面的代碼中實現它？

overlap_func <- function(dataset1,dataset2) { 

for(i in 1:nrow(dataset1)) 
{ 

loop_start <- dataset1[i,"Start"] 
loop_end <- dataset1[i,"End"] 
p <- dataset2[,c(1,2)] 
dataset1_pos <- data.frame(loop_start,loop_end) 
dataset2_filter <- p[p$Start >= (loop_start-(loop_start/2)) & p$End <= (loop_end+ (loop_end/2)), ] 
data_in_loop <- rbind(dataset1_pos,dataset2_filter) 
plot_function(data_in_loop,loop_start,loop_end) 

} 
} 


plot_function <- function(loop_data,start,end){ 
pos <- 1:nrow(loop_data) 
dat1 <- cbind(pos,loop_data) 
colnames(dat1) <- c("pos","start","end") 
pdf(file=paste0("path where plots are generated","_",start,"-",end,"_","overlap.pdf")) 
plot(dat1$pos, type = 'n', xlim = range(c(start-(start/2), end+(end/2)))) 
segments(dat1$start, dat1$pos, dat1$end, dat1$pos) 
dev.off() 
} 


df1 <- read.table(header=T, text="Start End Sample1 Sample2 
1   60  1  4 
100  200  2  1 
201  250  1  4 
300  450  1  1") 

df2 <- read.table(header=T, text="Start End Sample1 Sample2 
40   60  1  1 
70  180  1  1 
240  330  2  1 
340  450  1  4 
500  900  1  4 
980  1200  2  1") 

overlap_func(df1,df2)

來源

2013-02-01 glow

你能大致顯示輸出會是什麼樣子（由畫上一些軟件或掃描抽出的紙嗎？）。這對我來說似乎很有意思，我很樂意嘗試，但是確切地知道你在找什麼是模糊的（或複雜的）。 – Arun

爲什麼這是donwvoted！ +1 – agstudy

（+）。 @agstudy，不知道。 – Arun

這樣的事情？

df1 <- read.table(header=T, text="Start End Sample1 Sample2 
1   60  1  4 
100  200  2  1 
201  250  1  4 
300  450  1  1") 

df2 <- read.table(header=T, text="Start End Sample1 Sample2 
40   60  1  1 
70  180  1  1 
240  330  2  1 
340  450  1  4 
500  900  1  4 
980  1200  2  1") 

require(IRanges) 
require(ggplot2) 
require(plyr) 

df1$id <- factor(1:nrow(df1)) 
ir2 <- IRanges(df2$Start, df2$End) 
out <- ddply(df1, .(id), function(x) { 
    ir1 <- IRanges(x$Start, x$End) 
    o.idx <- as.data.frame(findOverlaps(ir1, ir2))$subjectHits 
    df.out <- rbind(x[, 1:4], df2[o.idx, ]) 
    df.out$id1 <- x$id 
    df.out$id2 <- seq_len(nrow(df.out)) 
    df.out 
}) 
out$id1 <- factor(out$id1) 
out$id2 <- factor(out$id2) 
out$id3 <- factor(1:nrow(out)) 

p <- ggplot(out, aes(x = Start, y = id3 , colour = id2)) 
p <- p + geom_segment(aes(xend = End, ystart = id3, yend = id3)) 
p <- p + scale_colour_brewer(palette = "Set1") 
p

gglot2_no_facet_geom_segment

編輯：看着你更新圖紙後，也許這就是你想要什麼？

p + facet_wrap(~ id1, scales="free")

ggplot2_facet_geom_segment

編輯：保存在單獨的文件方面每個情節。您可以通過生成陰謀分裂每次上id1

d_ply(out, .(id1), function(ww) { 
    p <- ggplot(ww, aes(x = Start, y = id3 , colour = id2)) 
    p <- p + geom_segment(aes(xend = End, ystart = id3, yend = id3)) 
    p <- p + scale_colour_brewer(palette = "Set1") 
    fn <- paste0("~/Downloads/id", as.numeric(as.character(ww$id1[1])), ".pdf") 
    ggsave(fn, p) 
})

在fn相應設置路徑做到這一點。

來源

2013-02-01 13:49:59 Arun

+1使用IRanges！ – agstudy

我會使用類似'df1 $ id < - 交互（Sample1，Sample2）'的顏色.. – agstudy

啊哈，謝謝你的提示！我現在試試 – Arun

我試圖用lattice包來解決這個問題。我特別使用函數Shingle來了解要比較的時間間隔。我希望我能合併這兩個帶狀皰疹，但我不能。所以，一旦我有了我的第一個陰謀，我使用（如上述解決方案）IRanges包來計算重疊。這個想法是最後的dotplot。

## I red the input data 
dat <- read.table(text = 'Start End Sample1 Sample2 
1   60  1  4 
100  200  2  1 
201  250  1  4 
300  450  1  1', header = T) 

dat1 <- read.table(text = 'Start End Sample1 Sample2 
40   60  1  1 
70  180  1  1 
240  330  2  1 
340  450  1  4 
500  900  1  4 
980  1200  2  1', header = T) 


## I create my 2 shingles 
dat.sh <- shingle(x = dat[,3], intervals = dat[,c(1,2)]) 
dat1.sh <- shingle(x = dat1[,3], intervals = dat1[,c(1,2)]) 
## compute max value for plot comparison 
max.value <- max(c(dat$End,dat1$End)) 
## I plot the 2 series with differents color 
p1<- plot(dat.sh, xlim= c(0,max.value),col = 'red') 
p2 <- plot(dat1.sh,xlim= c(0,max.value), col ='green') 
library(gridExtra) 
grid.arrange(p1,p2)

這是比較我的間隔的一種快速方法。

enter image description here

這看起來不錯，但我不能用鵝卵石走得更遠，因爲我不能在同一地塊合併。所以我會用IRanges包來計算重疊。

library(IRanges) 
rang1 <- IRanges(start=dat[,1], end = dat[,2]) 
rang2 <- IRanges(start=dat1[,1], end = dat1[,2]) 
dat.plot  <- dat1     # use the first data.frame 
dat.plot$group <- 'origin' 
dat.plot$id <- rownames(dat1)   ## add an Id for each row 
rang.o <- findOverlaps(rang2,rang1)  # get overlaps 
dat.o <- dat1[[email protected],]  ## construct overlaps data.frame 
dat.o$id <- [email protected] 
dat.o$group <- 'overlap' 
dat.plot <- rbind(dat.plot,dat.o)  ## union of all 
dotplot(id ~End-Start|group , data=dat.plot, 
           groups = col,type = c("p", "h"))

enter image description here

來源

2013-02-01 16:07:36 agstudy

（+1）爲什麼最後的點圖？你不能將重疊組合在一起嗎？ – Arun

我認爲點圖很方便看到多個重疊。例如，請參閱第4段。我不知道使用格子的geom_segment的等價物:) – agstudy

從R中不同文件的位置重疊位置

回答

相關問題