2014-03-12 17 views
0

假設我想將分隔小於某個閾值的整數組合在一起。我的具體使用情況是確定的裸露代碼最大塊的測試覆蓋率結果,例如:如何在python中分組物品運行

groupruns('53, 55, 57, 59, 83, 200, 205, 211, 217, 219, 306, 311, 317, 323, 325, 367, 631, 636, 645, 658, 686, 692, 787, 792, 801, 870, 875, 884, 947, 993, 1134, 1139, 1148, 1158', 3) 
#=> [[53, 55, 57, 59], [83], [200], [205], [211], [217, 219], [306], [311], [317], [323, 325], [367], [631], [636], [645], [658], [686], [692], [787], [792], [801], [870], [875], [884], [947], [993], [1134], [1139], [1148], [1158]] 

回答

0

itertools.groupby可以幫助我們在這裏。唯一的困難是groupby由一個需要爲每個項目計算的「密鑰」分組,使得每個組由具有相同密鑰的連續項目組成。這意味着我們的keyfunc對象需要保存狀態來執行此任務:

class runner(object): 
    def __init__(self, threshold=1): 
     self.threshold = threshold 
     self.last = None 
     self.key = None 
    def __call__(self,item): 
     if self.last is None: 
      self.last = item 
      self.key = item 
      return item 
     if item - self.last <= self.threshold: 
      self.last = item 
      return self.key 
     else: 
      self.last = item 
      self.key = item 
      return item 

的基本思想是,如果我們的最後一個項目的閾值之內,我們回到目前的關鍵,這是第一項的當前運行。

讓我們看到,在行動:

[list(g) for k,g in itertools.groupby((int(s) for s in '53, 55, 57, 59, 83, 200, 205, 211, 217, 219, 306, 311, 317, 323, 325, 367, 631, 636, 645, 658, 686, 692, 787, 792, 801, 870, 875, 884, 947, 993, 1134, 1139, 1148, 1158'.split(',')), runner(3))] 
#=> [[53, 55, 57, 59], [83], [200], [205], [211], [217, 219], [306], [311], [317], [323, 325], [367], [631], [636], [645], [658], [686], [692], [787], [792], [801], [870], [875], [884], [947], [993], [1134], [1139], [1148], [1158]] 
1

你可以使用雷蒙德赫廷傑的集羣功能

def cluster(data, maxgap, key=None): 
    """Arrange data into groups where successive elements 
     differ by no more than *maxgap* 

     >>> cluster([1, 6, 9, 100, 102, 105, 109, 134, 139], maxgap=10) 
     [[1, 6, 9], [100, 102, 105, 109], [134, 139]] 

     >>> cluster([1, 6, 9, 99, 100, 102, 105, 134, 139, 141], maxgap=10) 
     [[1, 6, 9], [99, 100, 102, 105], [134, 139, 141]] 

    http://stackoverflow.com/a/14783998/190597 (Raymond Hettinger) 
    """ 
    data.sort() 
    groups = [[data[0]]] 
    for item in data[1:]: 
     if key: 
      val = key(item, groups[-1]) 
     else: 
      val = abs(item - groups[-1][-1]) 
     if val <= maxgap: 
      groups[-1].append(item) 
     else: 
      groups.append([item]) 
    return groups 

data = [53, 55, 57, 59, 83, 200, 205, 211, 217, 219, 306, 311, 317, 323, 325, 367, 631, 636, 645, 658, 686, 692, 787, 792, 801, 870, 875, 884, 947, 993, 1134, 1139, 1148, 1158] 
print(cluster(data, maxgap=3)) 

產量

[[53, 55, 57, 59], [83], [200], [205], [211], [217, 219], [306], [311], [317], [323, 325], [367], [631], [636], [645], [658], [686], [692], [787], [792], [801], [870], [875], [884], [947], [993], [1134], [1139], [1148], [1158]] 
0

這個怎麼樣,而無需使用任何模塊:

注意:假設它們已經排序。

#!/usr/bin/python 

group = (53, 55, 57, 59, 83, 200, 205, 211, 217, 219, 306, 311, 
     317, 323, 325, 367, 631, 636, 645, 658, 686, 692, 787, 
     792, 801, 870, 875, 884, 947, 993, 1134, 1139, 1148, 
     1158) 

def group_runs(group, step): 
    mark = [0] 
    diff = map(lambda x: (x[1] - x[0]), zip(group[:],group[1:])) 
    [mark.append(i+1) for i,j in enumerate(diff) if j > step] 
    return [list(group[x[0]:x[1]]) for x in zip(mark[::], mark[1::])] 

print group_runs(group, 3) 

輸出:

[[53, 55, 57, 59], [83], [200], [205], [211], [217, 219], [306], [311], [317], [323, 
    325], [367], [631], [636], [645], [658], [686], [692], [787], [792], [801], [870], 
    [875], [884], [947], [993], [1134], [1139], [1148]] 
+0

這是不容易閱讀。可讀性計數。您可能想簡化並解釋此代碼。除非有特定的理由,否則列表理解中的副作用也是不鼓勵的。 – Marcin

相關問題