我有一組對X點的繪製段沿x軸創建R中的自定義閱讀地圖的性能:增加可視化重疊段
半繪製這些任務細分市場正在決定它們的y位置,因此沒有兩個重疊的細分市場在同一個y水平上。對於每個片段,我從第一個位置迭代y個級別,直到到達一個不包含與當前片段重疊的片段的位置。然後記錄當前分段的結束位置並移至下一個分段。
實際的代碼是一個函數如下:
# Dummy data
# A list of start and end positions for each segment along the X axis. Sorted by start.
# Passing the function few.reads draws a map in half a second. Passing it many.reads takes about half an hour to complete.
few.reads <- data.frame(start=c(rep(10,150), rep(16,100), rep(43,50)), end=c(rep(30,150), rep(34,100), rep(57,50)));
many.reads <- data.frame(start=c(rep(10,15000), rep(16,10000), rep(43,5000)), end=c(rep(30,15000), rep(34,10000), rep(57,5000)));
#---
# A function to draw a series of overlapping segments (or "reads" in my along
# The x-axis. Where reads overlap, they are "stacked" down the y axis
#---
drawReads <- function(reads){
# sort the reads by their start positions
reads <- reads[order(reads$start),];
# minimum and maximum for x axis
minstart <- min(reads$start);
maxend <- max(reads$end);
# initialise yread: a list to keep track of used y levels
yread <- c(minstart - 1);
ypos <- c(); #holds the y position of the ith segment
#---
# This iteration step is the bottleneck. Worst case, when all reads are stacked on top
# of each other, it has to iterate over many y levels to find the correct position for
# the later reads
#---
# iterate over segments
for (r in 1:nrow(reads)){
read <- reads[r,];
start <- read$start;
placed <- FALSE;
# iterate through yread to find the next availible
# y pos at this x pos (start)
y <- 1;
while(!placed){
if(yread[y] < start){
ypos[r] <- y;
yread[y] <- read$end;
placed <- TRUE;
}
# current y pos is used by another segment, increment
y <- y + 1;
# initialize another y pos if we're at the end of the list
if(y > length(yread)){
yread[y] <- minstart-1;
}
}
}
#---
# This is the plotting step
# Once we are here the rest of the process is very quick
#---
# find the maximum y pos that is used to size up the plot
maxy <- length(yread);
miny = 1;
reads$ypos <- ypos + miny;
print("New Plot...")
# Now we have all the information, start the plot
plot.new();
plot.window(xlim=c(minstart, maxend+((maxend-minstart)/10)), ylim=c(1,maxy));
axis(3,xaxp=c(minstart,maxend,(maxend-minstart)/10));
axis(2, yaxp=c(miny,maxy,3),tick=FALSE,labels=FALSE);
print("Draw the reads...");
maxy <- max(reads$ypos);
segments(reads$start, maxy-reads$ypos, reads$end, maxy-reads$ypos, col="blue");
}
我的實際數據集是非常大的,並且包含最多可以有60萬的區域讀取,據我可以告訴。讀取結果自然會堆疊在一起,因此很容易實現最糟糕的情況,即所有讀取都相互重疊。繪製大量讀取所花費的時間對我來說是不可接受的,所以我正在尋找一種方法來提高過程的效率。我可以用更快的東西來替換我的循環嗎?有一種算法可以更快地安排讀取嗎?我現在真的想不出更好的方式來做這件事。
感謝您的幫助。
不要緊張繪製它,你會怎麼可能_interpret_有60萬行的圖表呢? – 2012-03-26 12:38:16
我正在編寫這些地圖,以手動選擇我的數據區域,這些區域在其閱讀的佈局中具有特定的特徵。如果我有很多堆疊起來的話,它們最終會被壓扁成一個波浪形的矩形。在那一點上,地圖仍然顯示了一些東西,儘管將它變成直方圖可能會更好。不過,你提到一個好點,我可能正在走一條相當不合適的道路。 – MattLBeck 2012-03-26 12:59:29