2012-12-12 83 views
1

我得到了一個由0和1組成的數組。1s形成圖像中顯示的連續集羣。Python 2D集羣查找器

Clustering

簇的數量事先不知道。

有什麼方法可以創建一個列表,其中包含所有集羣的位置或每個集羣中包含其所有成員位置的列表。例如:

cluster_list = continuous_cluster_finder(data_array) 
cluster_list[0] = [(pixel1_x, pixel1_y), (pixel2_x, pixel2_y),...] 
+0

究竟分離的集羣?可以肯定地說,每個羣集都被零包圍了嗎?上面,下面,左邊,右邊的含義是零? – vkontori

+0

'data_array'的格式是什麼?列表清單? numpy數組? Python的數組之一? –

回答

2

從描述中不清楚什麼是問題的確切限制。 假設你可以通過零上區分集羣左,右,上,下那麼下面解決問題...

#!/usr/bin/env python 

data = [ #top-left 
     [0,0,1,1,0,0], 
     [0,0,1,1,0,0], 
     [1,1,0,0,1,1], 
     [1,1,0,0,1,1], 
     [0,0,1,1,0,0], 
     [0,0,1,1,0,0], 
     [1,1,0,0,1,1], 
     [1,1,0,0,1,1], 
     ]    # bottom-right 

d = {} # point --> clid 
dcl = {} # clid --> [point1,point2,...] 

def process_point(t): 
    global clid # cluster id 
    val = data[t[0]][t[1]] 
    above = (t[0]-1, t[1]) 
    abovevalid = 0 <= above[0] < maxX and 0 <= above[1] < maxY 
    #below = (t[0]+1, t[1]) # We do not need that because we scan from top-left to bottom-right 
    left = (t[0], t[1]-1) 
    leftvalid = 0 <= left[0] < maxX and 0 <= left[1] < maxY 
    #right = (t[0], t[1]+1) # We do not need that because we scan from top-left to bottom-right 

    if not val: # for zero return 
     return 
    if left in d and above in d and d[above] != d[left]: 
     # left and above on different clusters, merge them 
     prevclid = d[left] 
     dcl[d[above]].extend(dcl[prevclid]) # update dcl 
     for l in dcl[d[left]]: 
      d[l] = d[above] # update d 
     del dcl[prevclid] 
     dcl[d[above]].append(t) 
     d[t] = d[above] 
    elif above in d and abovevalid: 
     dcl[d[above]].append(t) 
     d[t] = d[above] 
    elif left in d and leftvalid: 
     dcl[d[left]].append(t) 
     d[t] = d[left] 
    else: # First saw this one 
     dcl[clid] = [t] 
     d[t] = clid 
     clid += 1 

def print_output(): 
    for k in dcl: # Print output 
     print k, dcl[k] 

def main(): 
    global clid 
    global maxX 
    global maxY 
    maxX = len(data) 
    maxY = len(data[0]) 
    clid = 0 
    for i in xrange(maxX): 
     for j in xrange(maxY): 
      process_point((i,j)) 
    print_output() 

if __name__ == "__main__": 
    main() 

它打印...

0 [(0, 2), (0, 3), (1, 2), (1, 3)] 
1 [(2, 0), (2, 1), (3, 0), (3, 1)] 
2 [(2, 4), (2, 5), (3, 4), (3, 5)] 
3 [(4, 2), (4, 3), (5, 2), (5, 3)] 
4 [(6, 0), (6, 1), (7, 0), (7, 1)] 
5 [(6, 4), (6, 5), (7, 4), (7, 5)] 
+0

連接組件算法的良好實現。我想,使用來自每個未訪問節點的DFS的圖着色方法肯定會更快。 – Arcturus

1

您可以看看一個衆所周知的'blob'查找算法,用於圖像處理以隔離相同顏色的區域。您也可以通過查找島嶼並將其標記出來(也就是所有人在開始時都未被訪問過)來製作自己的口味。全部連接(在3x3網格中,中心像素爲8連通性)並且訪問像素形成一個區域;你需要在地圖中找到所有這些區域。

Blob發現是你需要尋找的東西。