2013-08-04 44 views
1

蟒蛇組項目,沒有重複

(('A', '1', 'UTC\xb100:00'), ('B', '1', 'UTC+01:00'), ('C', '1', 'UTC+02:00'), ('D', '1', 'UTC+01:00'), ('E', '1', 'UTC\xb100:00'), ('F', '1', 'UTC+03:00')) 

,並想

(('A','E, '1', 'UTC\xb100:00'), ('B','D', '1', 'UTC+01:00'), ('C', '1', 'UTC+02:00'), ('F', '1', 'UTC+03:00')) 

我已經看到了你可以用一個列表做到這一點,但我還沒有看到這種情況使用turple ..這是可能的..?

回答

1

您可以使用groupby,但你必須先解決輸入,就像這樣:

from itertools import groupby 
from operator import itemgetter 

l = (('A', '1', 'UTC\xb100:00'), ('B', '1', 'UTC+01:00'), ('C', '1', 'UTC+02:00'), ('D', '1', 'UTC+01:00'), ('E', '1', 'UTC\xb100:00'), ('F', '1', 'UTC+03:00')) 

result = [] 
key_items = itemgetter(1, 2) 
for key, group in groupby(sorted(l, key=key_items), key=key_items): 
    item = [] 
    item.extend([k[0] for k in group]) 
    item.extend(key) 
    result.append(tuple(item)) 

print tuple(result) 

此代碼打印:

(('B', 'D', '1', 'UTC+01:00'), ('C', '1', 'UTC+02:00'), ('F', '1', 'UTC+03:00'), ('A', 'E', '1', 'UTC\xb100:00')) 

這不是那麼美麗,我明白。

0

這可能是矯枉過正使用pandas這一點,但你可以:

import pandas as pd 

# somehow, pandas 0.12.0 does prefers 
# a list of tuples rather than a tuple of tuples 
t = [('A', '1', 'UTC\xb100:00'), 
    ('B', '1', 'UTC+01:00'), 
    ('C', '1', 'UTC+02:00'), 
    ('D', '1', 'UTC+01:00'), 
    ('E', '1', 'UTC\xb100:00'), 
    ('F', '1', 'UTC+03:00')] 

df = pd.DataFrame(t, columns=('letter', 'digit', 'tz')) 
grouped = df.groupby('tz') 

print(grouped.groups) 

# {'UTC+01:00': [1, 3], 
# 'UTC+02:00': [2], 
# 'UTC+03:00': [5], 
# 'UTC\xb100:00': [0, 4]} 

merged = [] 
for key, vals in grouped.groups.iteritems(): 
    update = [ t[idx][0] for idx in vals ] # add the letters 
    update += t[idx][1:] # add the digit and the TZ 
    merged.append(update) 

print(merged) 
# [['F', '1', 'UTC+03:00'], ['C', '1', 'UTC+02:00'], \ 
# ['A', 'E', '1', 'UTC\xb100:00'], ['B', 'D', '1', 'UTC+01:00']] 

的好處是相當簡潔df.groupby('tz'),不足之處相當嚴重依賴(熊貓加上其依賴)。

一個可以凝聚合併成一個有點不太理解行:

merged = [[t[idx][0] for idx in vs] + list(t[idx][1:]) 
      for vs in grouped.groups.values()] 
0

你可以使用理解,但仍然有點複雜。

tuples = (('A', '1', 'UTC\xb100:00'), ('B', '1', 'UTC+01:00'), ('C', '1', 'UTC+02:00'), ('D', '1', 'UTC+01:00'), ('E', '1', 'UTC\xb100:00'), ('F', '1', 'UTC+03:00')) 

>>values = set(map(lambda x:x[1:3], tuples)) 
set([('1', 'UTC+03:00'), ('1', 'UTC\xb100:00'), ('1', 'UTC+01:00'), ('1', 'UTC+02:00')]) 

>>f = [[y[0] for y in tuples if y[1:3]==x] for x in values] 
[['F'], ['A', 'E'], ['B', 'D'], ['C']] 

>>r = zip((tuple(t) for t in f), values) 
[(('F',), ('1', 'UTC+03:00')), (('A', 'E'), ('1', 'UTC\xb100:00')), (('B', 'D'), ('1', 'UTC+01:00')), (('C',), ('1', 'UTC+02:00'))] 

>>result = tuple([sum(e,()) for e in r]) 
(('F', '1', 'UTC+03:00'), ('A', 'E', '1', 'UTC\xb100:00'), ('B', 'D', '1', 'UTC+01:00'), ('C', '1', 'UTC+02:00')) 

要放在一起:

values = set(map(lambda x:x[1:3], tuples)) 
f = [[y[0] for y in tuples if y[1:3]==x] for x in values] 
r = zip((tuple(t) for t in f), values) 
result = tuple([sum(e,()) for e in r]) 
0

隨着你不允許修改內容的元組,但你可以例如串聯元組讓其他的元組。

def process(data): 
    res = [] 
    for L in sorted(data, key=lambda x:x[2][-5:]): 
     if res and res[-1][2][-5:] == L[2][-5:]: 
      # Same group... do the merge 
      res[-1] = res[-1][:-2] + (L[0],) + res[-1][-2:] 
     else: 
      # Different group 
      res.append(L) 
    return res 

最終的結果在我看來,更多的是一種列表(邏輯同質內容),而不是一個元組,但如果你真的需要一個元組你可以return tuple(res)代替。

0

如果你只關心這個代碼相同的項目在同一個元組,那麼這個答案的工作原理:

nodup = {} 
my_group_of_items = (('A', '1', 'UTC\xb100:00'), ('B', '1', 'UTC+01:00'), ('C', '1', 'UTC+02:00'), ('D', '1', 'UTC+01:00'), ('E', '1', 'UTC\xb100:00'), ('F', '1', 'UTC+03:00')) 
for r in my_group_of_items: 
    if r[-1] not in nodup: nodup[r[-1]] = set() 
    nodup[r[-1]] |= set(r[:-1]) 

result = [ tuple(list(nodup[t])+[t]) for t in nodup ] 
print result