查找兩個排序列表之間共現的算法優化

目前，我的算法需要（估計）超過十個小時才能完成。它現在仍在運行，這樣我就可以更好地估計它是多麼簡單的很糟糕它。查找兩個排序列表之間共現的算法優化

說我有一組人P每一個排序的不同長度的存在列表，其中我是一個索引變量。我想創建圖形ģ使得ģ_{P _我，P _Ĵ} = N，其中Ñ是P _{我之間的邊緣重} and P _j其中r表示它們在特定靜態範圍內共同出現的次數r。

我現在的算法是盲目和在Python實現（是可讀可不含糊）如下：（修改爲簡潔起見從its repository on GitHub）

print '>Generating combinations...', 
pairs = combinations(people, 2) 
print 'Done' 

print 'Finding co-occurences' 
radius = 5 
for A, B in pairs: 
    for oA in A.occurances: 
     for oB in B.occurances: 
      if oB in range(oA - radius, oA + radius): 
       try: 
        network.edge[A.common_name][B.common_name]['weight'] += 1 
       except: 
        network.add_edge(A.common_name, B.common_name, weight=1)

我已經考慮改變這個算法，這樣，當oB超過目前的oA的範圍，循環繼續到下一個oA。

有沒有更好的實現這個的方法，考慮到列表是排序？

來源

2013-04-23 Sean Allred

只要您通過上限，您就想轉移到下一個oA是一個好主意。另外，如果A.occurances和B.occurances的範圍相比，「半徑」大，那麼這將是更有效不從B.occurances開始每次啓動：

print '>Generating combinations...', 
pairs = combinations(people, 2) 
print 'Done' 

print 'Finding co-occurences' 
radius = 5 
for A, B in pairs: 
    i = 0 
    b = B.occurances 
    maxi = len(B.occurances) - 1 
    for oA in A.occurances: 
     lo = oA - radius 
     hi = oA + radius 
     while (b[i] > lo) and (i > 0):  # while we're above the low end of the range 
      i = i - 1      # go towards the low end of the range 
     while (b[i] < lo) and (i < maxi): # while we're below the low end of the range 
      i = i + 1      # go towards the low end of the range 
     if b[i] >= lo: 
      while (b[i] <= hi):   # while we're below the high end of the range 
       try:      # increase edge weight 
        network.edge[A.common_name][B.common_name]['weight'] += 1 
       except: 
        network.add_edge(A.common_name, B.common_name, weight=1) 

       if i < maxi:    # and go towards the high end of the range 
        i = i + 1 
       else: 
        break

請注意，我還沒有調試這個，所以它可能存在一些bug，但希望你能得到我想要做的一般想法。當然，你可以對這個想法進行進一步的優化，但是這應該比暴力方法更有效率。

來源

2013-04-23 13:47:14

一個選項是將B.occurances放在interval tree中，以便您可以快速查詢範圍內的所有oB（oA - 半徑，oA +半徑）。

另一種選擇是在桶中對B.occurances進行索引，例如，（0，1），[1,2]等。然後，通過選擇具有索引（oA - 半徑）到（oA +半徑）的桶，可以快速找到範圍內的所有oB（oA - 半徑，oA +半徑）。這些桶是近似的，因此您仍需要迭代地驗證第一個和最後一個選定桶中的所有oB。

來源

2013-04-23 13:40:00

查找兩個排序列表之間共現的算法優化

回答

相關問題