2013-04-04 80 views
6

我與掙扎,因爲我敢肯定,十幾for循環是不是這個問題的解決方案:尋找數字集羣列表

有數字像

排序列表
numbers = [123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430] 

,我想創建一個號碼清單的字典,其中數的差(以下對方)不超過15.所以輸出會是這樣:

clusters = { 
    1 : [123, 124, 128], 
    2 : [160, 167], 
    3 : [213, 215, 230, 245, 255, 257], 
    4 : [400, 401, 402], 
    5 : [430] 
} 

我現在搜索解決方案n有點難看(我必須在最後刪除重複的內容...),我相信它可以用pythonic的方式完成。

這就是我現在做的事:

clusters = {} 
dIndex = 0 
for i in range(len(numbers)-1) : 
    if numbers[i+1] - numbers[i] <= 15 : 
     if not clusters.has_key(dIndex) : clusters[dIndex] = [] 
     clusters[dIndex].append(numbers[i]) 
     clusters[dIndex].append(numbers[i+1]) 
    else : dIndex += 1 
+2

[K-means clustering](http://en.wikipedia.org/wiki/K-means_clustering)在這種情況下可能會有用。 – Blender 2013-04-04 01:10:11

+0

[defaultdict](http://docs.python.org/2/library/collections.html#collections.defaultdict)會讓你的代碼變得更簡單 – tacaswell 2013-04-04 01:13:34

+0

謝謝,我會看看兩者! – tamasgal 2013-04-04 01:16:46

回答

8

如果列表很小,不是必須的,但我可能會採用「流處理」方式來處理這個問題:定義一個生成器,它將您的輸入迭代起來,並生成分組爲相差< = 15.然後你可以很容易地使用它來生成你的字典。

def grouper(iterable): 
    prev = None 
    group = [] 
    for item in iterable: 
     if not prev or item - prev <= 15: 
      group.append(item) 
     else: 
      yield group 
      group = [item] 
     prev = item 
    if group: 
     yield group 

numbers = [123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430] 
dict(enumerate(grouper(numbers), 1)) 

打印:

{1: [123, 124, 128], 
2: [160, 167], 
3: [213, 215, 230, 245, 255, 257], 
4: [400, 401, 402], 
5: [430]} 

作爲獎勵,這個(只要他們排序,當然),讓你即使你組運行潛在的無限名單。您也可以將索引生成部分粘貼到生成器本身中(而不是使用enumerate)作爲小改進。

+0

哇,真的很酷的功能!我只是爲了那個。謝謝@tzaman – otmezger 2013-04-18 09:55:49

3
import itertools 
import numpy as np 

numbers = np.array([123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430]) 
nd = [0] + list(np.where(np.diff(numbers) > 15)[0] + 1) + [len(numbers)] 

a, b = itertools.tee(nd) 
next(b, None) 
res = {} 
for j, (f, b) in enumerate(itertools.izip(a, b)): 
    res[j] = numbers[f:b] 

如果您可以使用itertools和numpy的。改編爲pairwise爲迭代技巧。需要+1來移動索引,將0len(numbers)添加到列表中,確保正確包含第一個和最後一個條目。

你明顯可以做到這一點與itertools,但我喜歡tee

0

使用發電機到邏輯分離:(一個函數做一件事情)

numbers = [123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430] 

def cut_indices(numbers): 
    # this function iterate over the indices that need to be 'cut' 
    for i in xrange(len(numbers)-1): 
     if numbers[i+1] - numbers[i] > 15: 
      yield i+1 

def splitter(numbers): 
    # this function split the original list into sublists. 
    px = 0 
    for x in cut_indices(numbers): 
     yield numbers[px:x] 
     px = x 
    yield numbers[px:] 

def cluster(numbers): 
    # using the above result, to form a dict object. 
    cluster_ids = xrange(1,len(numbers)) 
    return dict(zip(cluster_ids, splitter(numbers))) 

print cluster(numbers) 

上面的代碼給我

{1: [123, 124, 128], 2: [160, 167], 3: [213, 215, 230, 245, 255, 257], 4: [400, 401, 402], 5: [430]} 
1

下面是一個相對簡單的解決方案,爲列表或發電機的工作原理。它懶洋洋地產生對(group_number, element),所以如果你需要的話,你必須單獨做實際的分組。 (或者您可能只需要組號。)

from itertools import tee 

def group(xs, gap=15): 
    # use `tee` to get two efficient iterators 
    xs1, xs2 = tee(xs) 

    # the first element is in group 0, also advance the second iterator 
    group = 0 
    yield (group, next(xs2)) 

    # after advancing xs2, this zip is pairs of consecutive elements 
    for x, y in zip(xs1, xs2): 
     # whenever the gap is too large, increment the group number 
     if y - x > gap: 
      group += 1 
     # and yield the second number in the pair 
     yield group, y