迷惑分區域列表中的覆蓋區域

我有一個很長的值列表x和y，按x值排序。我想輸出x和y值最長連續跨度的列表。這是有點難以付諸話但將有希望成爲用下面的例子清楚：作爲5768和6000之間的區域沒有被任何條目的覆蓋，上述應迷惑分區域列表中的覆蓋區域

輸出：

0, 5768 
6000, 7262

在我看來，這應該是一個簡單的問題，但我一直沒有解決方案的工作一段時間。我已經在下面發佈了我的代碼。我目前的努力存在的問題是，雖然對x值進行排序，但可能是第k行的x值超過第k-1行的y值，但不標記新連續字符串的開始。

lines = [line.strip('\n') for line in open('test')] 
myarray=[] 
for line in lines: 
    myarray.append(line.split(', ')) 

def findCoveredRegions(regionArray): 
    resultsContigs = [] 
    j = regionArray[0][1] 
    i = regionArray[0][0] 
    for line in regionArray: 
     last_i = i 
     i = line[0] 
     if i <= j: 
      if line[1] > j: 
       j = line[1] 
     else: 
      resultsContigs.append([last_i,j]) 
    resultsContigs.append([i,regionArray[len(regionArray)-1][1]]) 

    return resultsContigs 

print findCoveredRegions(myarray)

來源

2017-03-10 Sigurgeir

對不起，我不明白這個問題，即使是這個例子。如果你發現很難用文字表達，那麼你幾乎可以肯定地發現很難將其寫入代碼。也許首先就是這樣做的。 – Denziloe

您能否詳細說明您的意思：1）「x和y值的最長連續跨度」2）「5768和6000之間的區域未被覆蓋」 –

想象一下，您將連續序列中的0到7262之間的所有數字。我們可以將我的例子中的每一行看作0到148,0到145等所有數字的字符串。我想要生成的是0到7262之間的區域列表，其數目至少會出現一次，知道有些數字根本不會出現。 5768和6000之間的數字不是任何子序列的一部分，但0和5768以及6000和7262之間的所有數字至少被其中一個區域「覆蓋」。這有道理嗎？ – Sigurgeir

這裏是一個numpy的解決方案

myarray = np.asanyarray(myarray) 
order = np.argsort(myarray.ravel()) 
coverage = np.add.accumulate(1 - 2*(order%2)) 
gaps = np.where(coverage==0)[0] 
left = order[np.r_[0, gaps[:-1] + 1]] 
right = order[gaps] 
result = myarray.ravel()[np.c_[left, right]]

它池和排序的所有邊界。然後從左到右計算它遇到的左（+1）和右（-1）邊界的數量。這個數字永遠不會是負數，只有在存在差距時纔會降到零。從間隙位置重建覆蓋間隔。

來源

2017-03-10 21:43:28

這似乎並沒有給我提供的測試用例提供正確的解決方案。我得到以下輸出： [[ '0' '2374'] [ '2376' '2415'] [ '3000' '4315'] [ '5768' '600'] [ '6000'「7262 ']] – Sigurgeir

事實上，它會檢查_adjacent_間隔，但它必須檢查上次覆蓋的間隔，其中可能包含完全跟隨它的一些間隔。 – 9000

@Sigurgeir我確實得到了正確的結果，只要我將字符串轉換爲整數。它不適合你，因爲'600'>'5768'是字符串等，當然，它完全跳出算法。 –

這不會特別快，但我認爲這是相當Pythonic和可讀性。它不需要或使用排序的間隔列表。

intervals = [(0, 148), 
(0, 145), 
(0, 186), 
(0, 5768), 
(600, 2374), 
(2376, 2415), 
(3000, 4315), 
(6000, 6616), 
(6000, 6799), 
(6000, 7262)] 


def intersect(interval_a, interval_b): 
    """Return whether two intervals intersect""" 
    (a_bottom, a_top), (b_bottom, b_top) = interval_a, interval_b 
    return a_bottom <= b_top and b_bottom <= a_top 


def union_one_one(interval_a, interval_b): 
    """Return the union of two intervals""" 
    (a_bottom, a_top), (b_bottom, b_top) = interval_a, interval_b 
    return min(a_bottom, b_bottom), max(a_top, b_top) 


def union_many_one(old_intervals, new_interval): 
    """Return the union of a new interval with several old intervals.""" 
    result = [] 
    for old_interval in old_intervals: 
     # If an old interval intersects with the new interval, merge the old interval into the new one. 
     if intersect(old_interval, new_interval): 
      new_interval = union_one_one(old_interval, new_interval)  
     # Otherwise, leave the old interval alone. 
     else: 
      result.append(old_interval) 
    result.append(new_interval) 
    return result 


def union_all(intervals): 
    """Return the union of a collection of intervals""" 
    result = [] 
    for interval in intervals: 
     result = union_many_one(result, interval) 
    return result    


print(union_all(intervals))

來源

2017-03-10 23:49:32 Denziloe

不錯。當然，最壞的情況是O（n^2），但在某些情況下（大量重疊），它甚至可能比排序解決方案更快。 –

迷惑分區域列表中的覆蓋區域

回答

相關問題