我的問題如下：Python的間隔interesction

具有間隔的列表文件：

和一系列

0 200

我想這樣做這樣的交集將在給定範圍內報告我的間隔之間的位置[開始結束]。

例如：

8 9 
12 20 
30 200

除了任何想法如何咬這，也將是不錯的閱讀優化的一些想法，因爲一如既往輸入文件都將是巨大的。

來源

2013-08-23 Irek

現間隔排序？ – J0HN

我並不完全明白你在問什麼，但是考慮到只有1個範圍是你匹配的，那麼如果你需要掃描整個文件，O（n）是最壞的情況。如果你可以保存在內存中，並需要做多個查詢，然後按不同的二叉樹開始和結束排序它們可以幫助... –

你還沒有0 1嗎？ – njzk2

該解決方案的工作原理是間隔按起始點排序，不需要創建與總範圍一樣大的列表。

代碼

with open("0.txt") as f: 
    t=[x.rstrip("\n").split("\t") for x in f.readlines()] 
    intervals=[(int(x[0]),int(x[1])) for x in t] 

def find_ints(intervals, mn, mx): 
    next_start = mn 
    for x in intervals: 
     if next_start < x[0]: 
      yield next_start,x[0] 
      next_start = x[1] 
     elif next_start < x[1]: 
      next_start = x[1] 
    if next_start < mx: 
     yield next_start, mx 

print list(find_ints(intervals, 0, 200))

輸出：

（在你給的例子的情況下）

[(0, 1), (8, 9), (12, 20), (30, 200)]

來源

2013-08-23 10:57:39 Teudimundo

有一個問題：輸出中的最大間隔結束於：9980032 範圍而不是200現在設置爲249250622.你知道什麼是發生，還是你需要更多的信息？我傳遞的數據按照要求格式化（增加起始位置的順序） – Irek

應該沒有問題。你應該得到最後的時間間隔爲9980032，249250622.至少是它在做什麼，如果你添加一個間隔在你給的例子中（例如（5，9980032）），你使用249250622作爲mx – Teudimundo

我不知道，如果你想檢查： https://www.dropbox.com/s/ftavov170n8987j/0 使用的最大範圍：249250622 – Irek

粗糙算法：

創建布爾值的陣列，全部設置爲假seen = [False]*200
遍歷輸入文件，對於每一個線start end設置seen[start] .. seen[end]是True
一旦完成，那麼你可以輕鬆地走數組來找到未使用的時間間隔。

在最佳化的方面，如果輸入的範圍列表上開始數進行排序，那麼你就可以跟蹤最高看到數並用它來過濾範圍，因爲它們處理 - 例如像

for (start,end) in input: 
    if end<=lowest_unseen: 
    next 
    if start<lowest_unseen: 
    start=lowest_unseen 
    ...

它（忽略原始排序的成本）應該使整個事情爲O（n） - 你去通過陣列一次標籤看到/看不見的，一旦輸出unseens。

似乎我感覺很好。這裏是（未優化）的代碼，假設你的輸入文件被稱爲input

seen = [False]*200 
file = open('input','r') 
rows = file.readlines() 
for row in rows: 
    (start,end) = row.split(' ') 
    print "%s %s" % (start,end) 
    for x in range(int(start)-1, int(end)-1): 
    seen[x] = True 

print seen[0:10] 

in_unseen_block=False 
start=1 
for x in range(1,200): 
    val=seen[x-1] 
    if val and not in_unseen_block: 
    continue 
    if not val and in_unseen_block: 
    continue 
    # Must be at a change point. 
    if val: 
    # we have reached the end of the block 
    print "%s %s" % (start,x) 
    in_unseen_block = False 
    else: 
    # start of new block 
    start = x 
    in_unseen_block = True 
# Handle end block 
if in_unseen_block: 
    print "%s %s" % (start, 200)

我要離開的優化作爲一個練習留給讀者。

來源

2013-08-23 09:56:13

對不起，因爲我仍然是一個pyton新手，我在實施你的提示時遇到了一些問題 – Irek

Cool ，謝謝。用249250621作爲我的範圍，內存錯誤Howebver。猜猜在內存中加載所有內容不是這樣一個好主意 – Irek

如果每次輸入間隔打開或關閉時都做一個註釋，您可以將opens和closes的鍵合併成一個有序集合，然後就可以完成所需的操作基本上可以這樣認爲，「好吧，讓我們假設每一對相鄰的數字形成一個間隔，然後我可以把所有的邏輯作爲離散塊集中在這些間隔上。」

myRange = range(201) 
intervals = [(1,5), (2,8), (9,12), (20,30)] 
opens = {} 
closes = {} 

def open(index): 
    if index not in opens: 
     opens[index] = 0 
    opens[index] += 1 

def close(index): 
    if index not in closes: 
     closes[index] = 0 
    closes[index] += 1 

for start, end in intervals: 
    if end > start: # Making sure to exclude empty intervals, which can be problematic later 
     open(start) 
     close(end) 

# Sort all the interval-endpoints that we really need to look at 
oset = {0:None, 200:None} 
for k in opens.keys(): 
    oset[k] = None 
for k in closes.keys(): 
    oset[k] = None 
relevant_indices = sorted(oset.keys()) 

# Find the clear ranges 
state = 0 
results = [] 
for i in range(len(relevant_indices) - 1): 
    start = relevant_indices[i] 
    end = relevant_indices[i+1] 

    start_state = state 
    if start in opens: 
     start_state += opens[start] 
    if start in closes: 
     start_state -= closes[start] 

    end_state = start_state 
    if end in opens: 
     end_state += opens[end] 
    if end in closes: 
     end_state -= closes[end] 
    state = end_state 

    if start_state == 0: 
     result_start = start 
     result_end = end 
     results.append((result_start, result_end)) 

for start, end in results: 
    print(str(start) + " " + str(end))

此輸出：

的間隔不需要進行排序。

來源

2013-08-23 10:42:54 alcedine

很好地工作，但是當249250621作爲範圍傳遞，內存錯誤。就像我說的那樣，這些文件是巨大的 – Irek

我會將文件和範圍分成幾個部分，這應該可以做到。 – Irek

Irek，我做了一些修改，我認爲這會解決你的內存不足的問題。看看它是否適合你。 – alcedine

Python的間隔interesction

回答

代碼

輸出：

相關問題