尋找數字集羣列表

我與掙扎，因爲我敢肯定，十幾for循環是不是這個問題的解決方案：尋找數字集羣列表

有數字像

排序列表

numbers = [123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430]

，我想創建一個號碼清單的字典，其中數的差（以下對方）不超過15.所以輸出會是這樣：

clusters = { 
    1 : [123, 124, 128], 
    2 : [160, 167], 
    3 : [213, 215, 230, 245, 255, 257], 
    4 : [400, 401, 402], 
    5 : [430] 
}

我現在搜索解決方案n有點難看（我必須在最後刪除重複的內容...），我相信它可以用pythonic的方式完成。

這就是我現在做的事：

clusters = {} 
dIndex = 0 
for i in range(len(numbers)-1) : 
    if numbers[i+1] - numbers[i] <= 15 : 
     if not clusters.has_key(dIndex) : clusters[dIndex] = [] 
     clusters[dIndex].append(numbers[i]) 
     clusters[dIndex].append(numbers[i+1]) 
    else : dIndex += 1

來源

2013-04-04 tamasgal

[K-means clustering]（http://en.wikipedia.org/wiki/K-means_clustering）在這種情況下可能會有用。 – Blender 2013-04-04 01:10:11

[defaultdict]（http://docs.python.org/2/library/collections.html#collections.defaultdict）會讓你的代碼變得更簡單 – tacaswell 2013-04-04 01:13:34

謝謝，我會看看兩者！ – tamasgal 2013-04-04 01:16:46

如果列表很小，不是必須的，但我可能會採用「流處理」方式來處理這個問題：定義一個生成器，它將您的輸入迭代起來，並生成分組爲相差< = 15.然後你可以很容易地使用它來生成你的字典。

def grouper(iterable): 
    prev = None 
    group = [] 
    for item in iterable: 
     if not prev or item - prev <= 15: 
      group.append(item) 
     else: 
      yield group 
      group = [item] 
     prev = item 
    if group: 
     yield group 

numbers = [123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430] 
dict(enumerate(grouper(numbers), 1))

打印：

{1: [123, 124, 128], 
2: [160, 167], 
3: [213, 215, 230, 245, 255, 257], 
4: [400, 401, 402], 
5: [430]}

作爲獎勵，這個（只要他們排序，當然），讓你即使你組運行潛在的無限名單。您也可以將索引生成部分粘貼到生成器本身中（而不是使用enumerate）作爲小改進。

來源

2013-04-04 01:48:35 tzaman

哇，真的很酷的功能！我只是爲了那個。謝謝@tzaman – otmezger 2013-04-18 09:55:49

import itertools 
import numpy as np 

numbers = np.array([123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430]) 
nd = [0] + list(np.where(np.diff(numbers) > 15)[0] + 1) + [len(numbers)] 

a, b = itertools.tee(nd) 
next(b, None) 
res = {} 
for j, (f, b) in enumerate(itertools.izip(a, b)): 
    res[j] = numbers[f:b]

如果您可以使用itertools和numpy的。改編爲pairwise爲迭代技巧。需要+1來移動索引，將0和len(numbers)添加到列表中，確保正確包含第一個和最後一個條目。

你明顯可以做到這一點與itertools，但我喜歡tee。

來源

2013-04-04 01:20:33 tacaswell

使用發電機到邏輯分離：（一個函數做一件事情）

numbers = [123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430] 

def cut_indices(numbers): 
    # this function iterate over the indices that need to be 'cut' 
    for i in xrange(len(numbers)-1): 
     if numbers[i+1] - numbers[i] > 15: 
      yield i+1 

def splitter(numbers): 
    # this function split the original list into sublists. 
    px = 0 
    for x in cut_indices(numbers): 
     yield numbers[px:x] 
     px = x 
    yield numbers[px:] 

def cluster(numbers): 
    # using the above result, to form a dict object. 
    cluster_ids = xrange(1,len(numbers)) 
    return dict(zip(cluster_ids, splitter(numbers))) 

print cluster(numbers)

上面的代碼給我

{1: [123, 124, 128], 2: [160, 167], 3: [213, 215, 230, 245, 255, 257], 4: [400, 401, 402], 5: [430]}

來源

2013-04-04 01:58:09

下面是一個相對簡單的解決方案，爲列表或發電機的工作原理。它懶洋洋地產生對(group_number, element)，所以如果你需要的話，你必須單獨做實際的分組。（或者您可能只需要組號。）

from itertools import tee 

def group(xs, gap=15): 
    # use `tee` to get two efficient iterators 
    xs1, xs2 = tee(xs) 

    # the first element is in group 0, also advance the second iterator 
    group = 0 
    yield (group, next(xs2)) 

    # after advancing xs2, this zip is pairs of consecutive elements 
    for x, y in zip(xs1, xs2): 
     # whenever the gap is too large, increment the group number 
     if y - x > gap: 
      group += 1 
     # and yield the second number in the pair 
     yield group, y

來源

2016-04-01 23:20:07 Joel

尋找數字集羣列表

回答

相關問題