Python OOP不相交集性能

-1

我構建了一個與Kruskal的MST算法一起使用的不相交集數據結構。我需要加載並結合一個包含200k個互連節點的圖，我認爲我的數據結構實現是一個瓶頸。Python OOP不相交集性能

您有關於如何提高性能的建議嗎？我認爲我的查找方法可能有問題。

class partition(object): 
    def __init__(self, element=None): 
     self.size = 0 
     if element == None: 
      self.contents = set() 
      self.representative = None 
     else: 
      self.contents = {element} 
      self.representative = element 
      self.size = 1 

    def find(self, element): 
     return element in self.contents 

    def add(self, partition): 
     self.contents = self.contents.union(partition) 
     self.size = len(self.contents) 

    def show(self): 
     return self.contents 

    def __repr__(self): 
     return str(self.contents) 

class disjoint_set(object): 
    def __init__(self): 
     self.partitions_count = 0 
     self.forest = {} 

    def make_set(self, element): 
     if self.find(element) == False: 
      new_partition = partition(element) 
      self.forest[new_partition.representative] = new_partition 
      self.partitions_count += 1 

    def union(self, x, y): 
     if x != y: 
      if self.forest[x].size < self.forest[y].size: 
       self.forest[y].add(self.forest[x].show()) 
       self.delete(x) 
      else: 
       self.forest[x].add(self.forest[y].show()) 
       self.delete(y) 

    def find(self, element): 
     for partition in self.forest.keys(): 
      if self.forest[partition].find(element): 
       return self.forest[partition].representative 
     return False 

    def delete(self, partition): 
     del self.forest[partition] 
     self.partitions_count -= 1 

if __name__ == '__main__': 
    t = disjoint_set() 
    t.make_set(1) 
    t.make_set(2) 
    t.make_set(3) 
    print("Create 3 singleton partitions:") 
    print(t.partitions_count) 
    print(t.forest) 
    print("Union two into a single partition:") 
    t.union(1,2) 
    print(t.forest) 
    print(t.partitions_count)

編輯：

閱讀註釋，做更多的研究，我意識到我原來的算法如何設計不良的是後。我從頭開始並把它放在一起。我把所有分區放到一個單獨的哈希表中，並在find（）中使用路徑壓縮。這看起來如何，我是否應該解決一些明顯的問題？

class disjoint_set(object): 
def __init__(self): 
    self.partitions_count = 0 
    self.size = {} 
    self.parent = {} 

def make_set(self, element): 
    if self.find(element) == False: 
     self.parent[element] = element 
     self.size[element] = 1 
     self.partitions_count += 1 

def union(self, x, y): 
    xParent = self.find(x) 
    yParent = self.find(y) 
    if xParent != yParent: 
     if self.size[xParent] < self.size[yParent]: 
      self.parent[xParent] = yParent 
      self.size[yParent] += self.size[xParent] 
      self.partitions_count -= 1 
     else: 
      self.parent[yParent] = xParent 
      self.size[xParent] += self.size[yParent] 
      self.partitions_count -= 1 

def find(self, element): 
    if element in self.parent: 
     if element == self.parent[element]: 
      return element 
     root = self.parent[element] 
     while self.parent[root] != root: 
      root = self.find(self.parent[root]) 
     self.parent[element] = root 
     return root 
    return False 

if __name__ == '__main__': 
    t = disjoint_set() 
    t.make_set(1) 
    t.make_set(2) 
    t.make_set(3) 
    t.make_set(4) 
    t.make_set(5) 
    print("Create 5 singleton partitions") 
    print(t.partitions_count) 
    print("Union two singletons into a single partition") 
    t.union(1,2) 
    print("Union three singletones into a single partition") 
    t.union(3,4) 
    t.union(5,4) 
    print("Union a single partition") 
    t.union(2,4) 
    print("Parent List: %s" % t.parent) 
    print("Partition Count: %s" % t.partitions_count) 
    print("Parent of element 2: %s" % t.find(2)) 
    print("Parent List: %s" % t.parent)

來源

2017-04-26 Solomon Bothwell

你可以添加if __name__ =='__main__'部分來顯示使用情況嗎？ – Bey

對不起！現在添加。 –

[使用真正的不相交集森林數據結構。]（https://en.wikipedia.org/wiki/Disjoint-set_data_structure#Disjoint-set_forests）您剛挑出了幾個名字，這聽起來有點像不相交集然後編寫一個非常樸素的算法，與真正的不相交的集合森林沒有任何關係。 – user2357112

我想你的查找實現不是很有效，它應該是。

以下更改可能會有所幫助。

class disjoint_set(object): 
    def __init__(self): 
     self.partitions_count = 0 
     self.forest = {} 
     self.parent = {} 

    def make_set(self, element): 
     if not self.find(element): 
      new_partition = partition(element) 
      self.parent[element] = element 
      self.forest[new_partition.representative] = new_partition 
      self.partitions_count += 1 

def union(self, x, y): 
    if x != y: 
     if self.forest[x].size < self.forest[y].size: 
      self.forest[y].add(self.forest[x].show()) 
      #Update parent details 
      self.parent[self.forest[x].representative] = self.forest[y].representative 
      self.delete(x) 
     else: 
      self.forest[x].add(self.forest[y].show()) 
      #Update parent details 
      self.parent[self.forest[y].representative] = self.forest[x].representative 
      self.delete(y) 

def find(self, element): 
    if self.parent[element] == element: 
     return element 
    else: 
     return find(element)

該代碼可以與路徑壓縮來優化仍然有disjoint_set.find在O（1）運行。我猜O（log n）對大數仍然有用。

另一個瓶頸可能是你的工會職能。尤其是add函數的實現。

def add(self, partition): 
    self.contents = self.contents.union(partition)

嘗試使用set（它是一個inplace聯合）的更新方法。我認爲會造成大量的節點內存開銷。喜歡的東西

self.contents.update(partition)

有上並集和更新功能here一個很好的討論。

希望它有幫助！

來源

2017-04-27 00:06:21 arunk2

這不是路徑壓縮。這種算法基本上與分離集森林或路徑壓縮無關。 – user2357112

通過路徑壓縮，我的意思是保持O（1）中的查找執行。同意你所說的 - 這不是一個路徑壓縮實現。路徑壓縮的意圖是保持O（1）中的操作正確嗎？這是我想傳達的一些東西。刪除了提及路徑壓縮。感謝您指出它。 – arunk2

它也不是O（1），並且路徑壓縮不會使O（1）中的查找運行。這裏的'partition.find'方法是O（1），但這不是標準的不相交集合操作;實際的不相交集合查找操作由'disjoint_set.find'實現，我們必須實際找到該元素屬於哪個集合。那是O（len（self.forest）），這太可怕了。 – user2357112

Python OOP不相交集性能

回答

相關問題