增加可視化重疊段

我有一組對X點的繪製段沿x軸創建R中的自定義閱讀地圖的性能：增加可視化重疊段

example read map

半繪製這些任務細分市場正在決定它們的y位置，因此沒有兩個重疊的細分市場在同一個y水平上。對於每個片段，我從第一個位置迭代y個級別，直到到達一個不包含與當前片段重疊的片段的位置。然後記錄當前分段的結束位置並移至下一個分段。

實際的代碼是一個函數如下：

# Dummy data 
# A list of start and end positions for each segment along the X axis. Sorted by start. 
# Passing the function few.reads draws a map in half a second. Passing it many.reads takes about half an hour to complete. 
few.reads <- data.frame(start=c(rep(10,150), rep(16,100), rep(43,50)), end=c(rep(30,150), rep(34,100), rep(57,50))); 
many.reads <- data.frame(start=c(rep(10,15000), rep(16,10000), rep(43,5000)), end=c(rep(30,15000), rep(34,10000), rep(57,5000))); 

#--- 
# A function to draw a series of overlapping segments (or "reads" in my along 
# The x-axis. Where reads overlap, they are "stacked" down the y axis 
#--- 
drawReads <- function(reads){ 

    # sort the reads by their start positions 
    reads <- reads[order(reads$start),]; 

    # minimum and maximum for x axis 
    minstart <- min(reads$start); 
    maxend <- max(reads$end); 

    # initialise yread: a list to keep track of used y levels 
    yread <- c(minstart - 1); 
    ypos <- c(); #holds the y position of the ith segment 

    #--- 
    # This iteration step is the bottleneck. Worst case, when all reads are stacked on top 
    # of each other, it has to iterate over many y levels to find the correct position for 
    # the later reads 
    #--- 
    # iterate over segments 
    for (r in 1:nrow(reads)){ 
     read <- reads[r,]; 
     start <- read$start; 
     placed <- FALSE; 

     # iterate through yread to find the next availible 
     # y pos at this x pos (start) 
     y <- 1; 
     while(!placed){ 

      if(yread[y] < start){ 
       ypos[r] <- y; 
       yread[y] <- read$end; 
       placed <- TRUE; 
      } 

      # current y pos is used by another segment, increment 
      y <- y + 1; 
      # initialize another y pos if we're at the end of the list 
      if(y > length(yread)){ 
       yread[y] <- minstart-1; 
      } 
     } 
    } 

    #--- 
    # This is the plotting step 
    # Once we are here the rest of the process is very quick 
    #--- 
    # find the maximum y pos that is used to size up the plot 
    maxy <- length(yread); 
    miny = 1; 


    reads$ypos <- ypos + miny; 

    print("New Plot...") 
    # Now we have all the information, start the plot 
    plot.new(); 
    plot.window(xlim=c(minstart, maxend+((maxend-minstart)/10)), ylim=c(1,maxy)); 

    axis(3,xaxp=c(minstart,maxend,(maxend-minstart)/10)); 
    axis(2, yaxp=c(miny,maxy,3),tick=FALSE,labels=FALSE); 

    print("Draw the reads..."); 
    maxy <- max(reads$ypos); 
    segments(reads$start, maxy-reads$ypos, reads$end, maxy-reads$ypos, col="blue"); 
}

我的實際數據集是非常大的，並且包含最多可以有60萬的區域讀取，據我可以告訴。讀取結果自然會堆疊在一起，因此很容易實現最糟糕的情況，即所有讀取都相互重疊。繪製大量讀取所花費的時間對我來說是不可接受的，所以我正在尋找一種方法來提高過程的效率。我可以用更快的東西來替換我的循環嗎？有一種算法可以更快地安排讀取嗎？我現在真的想不出更好的方式來做這件事。

感謝您的幫助。

來源

2012-03-26 MattLBeck

不要緊張繪製它，你會怎麼可能_interpret_有60萬行的圖表呢？ – 2012-03-26 12:38:16

我正在編寫這些地圖，以手動選擇我的數據區域，這些區域在其閱讀的佈局中具有特定的特徵。如果我有很多堆疊起來的話，它們最終會被壓扁成一個波浪形的矩形。在那一點上，地圖仍然顯示了一些東西，儘管將它變成直方圖可能會更好。不過，你提到一個好點，我可能正在走一條相當不合適的道路。 – MattLBeck 2012-03-26 12:59:29

以貪婪的方式填充每個y級別。等級填滿後，降低一級，永不回頭。

僞代碼：

y <- 1 
while segment-list.not-empty 
    i <- 1 
    current <- segment-list[i] 
    current.plot(y) 
    segment-list.remove(i) 
    i <- segment-list.find_first_greater(current.end) 
    while (i > 0) 
    current <- segment-list[i] 
    current.plot(y) 
    segment-list.remove(i) 
    y <- y + 1

這並不一定產生在任何意義上的「最優」的情節，但至少它是爲O（n log n）的。

來源

2012-03-26 13:20:23

這不取決於'segment-list.find_first_greater（current.end）'的速度嗎？我們是不是基本上針對每個y級別的分段進行迭代？ – MattLBeck 2012-03-26 16:17:17

該列表按照片段的開始排序，因此二進制搜索是可能的。去除可能是一個問題，但它可以通過使用樹來解決。 – 2012-03-26 16:49:37

在R中尋找一個快速的方式來做這些低級別的操作可能很有趣，但這是一個不同的問題。感謝您向我展示這種方法！ – MattLBeck 2012-03-27 10:19:45

你能不能對起始值進行排序嗎？然後你從前到後瀏覽列表。對於每個項目，繪製它，然後對列表的其餘部分進行二進制搜索，以查找第一個項目大於剛剛繪製的項目的結束座標。如果沒有找到，請增加Y.在繪製時刪除每個項目。

排序爲O（N lg N），二元搜索爲O（lg N），因此總數爲O（N lg N）。

來源

2012-03-26 17:00:14 AShelly

好吧，聽起來像這是要去的方法，謝謝！ – MattLBeck 2012-03-27 10:20:19

增加可視化重疊段

回答

相關問題