2011-02-15 17 views
0

我有元組的n個列表(N < 10),格式爲[(ListID,[(指數值),(索引值),...)],並希望通過索引排序他們得到以下結果合併N個分割元組的名單在python

Example Input: 
[('A',[(0.12, 'how'),(0.26,'are'),(0.7, 'you'),(0.9,'mike'),(1.9, "I'm fine too")]), 
('B',[(1.23, 'fine'),(1.50, 'thanks'),(1.6,'and you')]), 
('C',[(2.12,'good'),(2.24,'morning'),(3.13,'guys')])] 

Desired Output: 
[('A', (0.12, 'how')), 
('A', (0.26, 'are')), 
('A', (0.7, 'you')), 
('A', (0.9, 'mike')), 
('B',(1.23, 'fine')), 
('B',(1.50, 'thanks')), 
('B',(1.6,'and you')), 
('A', (1.9, "I'm fine too")), 
('C',(2.12,'good')), 
('C',(2.24,'morning')), 
('C',(3.13,'guys'))] 

我知道代碼是醜陋的,尤其是那些索引項[0] [ - 1] [1],但有人可以告訴我,我究竟做錯了什麼?

content = []  
max = 0.0 
first = True 
Done = False 
finished = [] 
while not Done: 
    for item in flow: 
     if len(finished) == 4: 
      Done = True 
      break 
     if len(item[1]) == 0: 
      if item[0] not in finished: 
       finished.append(item[0]) 
      continue 
     if first == True: 
      max = item[1][-1][0] 
      content.append((item[0], item[1].pop())) 
      first = False 
      continue 
     if item[1][-1][0] > max: 
      max = item[1][-1][0] 
      content.append((item[0], item[1].pop())) 
      content = sorted(content, key=itemgetter(1))  

    first = True  

UPDATE: 謝謝大家

+2

Timsort利用部分排序的數據快速惡作劇。你的工作太多了。 – 2011-02-15 04:49:11

+1

不要使用max作爲變量名,你可能希望能夠使用內建函數`max()`有一天 – 2011-02-15 04:50:13

+1

這聽起來像你想要做的事很簡單,但是你給我們的所有東西都有些令人困惑輸出甚至不是python結構,也沒有示例輸入。例如在你的輸出中,列表ID都被分組在一起。 PLease只給出一些有效的Python數據結構來顯示輸入和所需的輸出 – 2011-02-15 04:57:58

回答

5
>>> from operator import itemgetter 
>>> import pprint 
>>> pprint.pprint(sorted(((i,k) for i,j in INPUT for k in j), key=itemgetter(1))) 
[('A', (0.12, 'how')), 
('A', (0.26000000000000001, 'are')), 
('A', (0.69999999999999996, 'you')), 
('A', (0.90000000000000002, 'mike')), 
('B', (1.23, 'fine')), 
('B', (1.5, 'thanks')), 
('B', (1.6000000000000001, 'and you')), 
('A', (1.8999999999999999, "I'm fine")), 
('C', (2.1200000000000001, 'good')), 
('C', (2.2400000000000002, 'morning')), 
('C', (3.1299999999999999, 'guys'))] 

主要有兩種東西會在這裏

[(i,k) for i,j in INPUT for k in j] 

需要將輸入到這個struture

[('A', (0.12, 'how')), 
('A', (0.26, 'are')), 
('A', (0.7, 'you')), 
('A', (0.9, 'mike')), 
('A', (1.9, "I'm fine")), 
('B', (1.23, 'fine')), 
('B', (1.5, 'thanks')), 
('B', (1.6, 'and you')), 
('C', (2.12, 'good')), 
('C', (2.24, 'morning')), 
('C', (3.13, 'guys'))] 

sorted(L, key=itemgetter(1)) 

排序L購買每個元素的項[1]。這實際上是(0.12,'如何'),(0.27,'是')...但是python排序元組的正常方式是從左到右,所以我們不需要做額外的工作來從元組

+0

提供的示例解決方案表明,雖然存在更多的問題規範,但是(即,空子列表將「終止」該數據集的部分,阻止使用該列表ID處理任何稍後的條目,以及在指定數量的不同數據集完成後提前終止循環) – ncoghlan 2011-02-15 06:10:09

2

(OK,樣本數據使得問題描述更清楚已答覆相應的修改)

步驟1:通過逆向工程當前解決方案澄清你問題說明。

  1. 有標記爲A,B,C和d 4個不同的數據集
  2. 集被包含在一系列的形式的2元組(ListID,元素)
  3. 每個 「元素」 這些數據條目本身就是一個表格(索引,值)的2元組列表
  4. 一個空元素條目表示數據集的結尾
  5. 目標是將這些數據集合併成一個單獨的2-元組(ListID,(index,value))

第2步:轉換輸入數據以創建所需表單的單個記錄。

發電機是爲這種事情而設計的,所以定義一個發電機是有意義的。

def get_data(flow, num_data_sets=4): 
    finished = set() 
    for list_id, elements in flow: 
     if list_id in finished: 
      continue 
     if not elements: 
      finished.add(list_id) 
      if len(finished) == num_data_sets: 
       break 
      continue 
     for element in elements: 
      yield list_id, element 

第3步:使用sorted產生期望的有序列表

content = sorted(get_data(flow)) 

使用範例:

# get_data defined via copy/paste of source code above 
# ref_data taken from the revised question 
>>> demo_data = [ 
... ('A', [(1, 2), (3, 4)]), 
... ('B', [(7, 8), (9, 10)]), 
... ('A', [(0, 0)]), 
... ('C', []), # Finish early 
... ('C', [('ignored', 'entry')]) 
... ] 
>>> content = sorted(get_data(demo_data)) 
>>> print '\n'.join(map(str, content)) 
('A', 0, 0) 
('A', 1, 2) 
('A', 3, 4) 
('B', 7, 8) 
('B', 9, 10) 
>>> content = sorted(get_data(ref_data), key=itemgetter(1)) 
>>> print '\n'.join(map(str, content)) 
('A', 0.12, 'how') 
('A', 0.26, 'are') 
('A', 0.7, 'you') 
('A', 0.9, 'mike') 
('B', 1.23, 'fine') 
('B', 1.5, 'thanks') 
('B', 1.6, 'and you') 
('A', 1.9, "I'm fine too") 
('C', 2.12, 'good') 
('C', 2.24, 'morning') 
('C', 3.13, 'guys') 

您的解決方案最終被雜亂,難以閱讀,主要有兩個理由:

  1. 沒有使用發電機意味着你沒有獲得內建排序功能
  2. 的全部利益通過使用索引,而不是元組拆包你使它很難跟蹤的是什麼就是什麼
2
data = [(x,id) for (id, xs) in data for x in xs] 
data.sort() 
for xs,id in data: 
    print id,xs 


A (0.12, 'how') 
A (0.26000000000000001, 'are') 
A (0.69999999999999996, 'you') 
A (0.90000000000000002, 'mike') 
B (1.23, 'fine') 
B (1.5, 'thanks') 
B (1.6000000000000001, 'and you') 
A (1.8999999999999999, "I'm fine too") 
C (2.1200000000000001, 'good') 
C (2.2400000000000002, 'morning') 
C (3.1299999999999999, 'guys') 
2

您的輸入:

l = [('A', 
    [(0.12, 'how'), 
    (0.26000000000000001, 'are'), 
    (0.69999999999999996, 'you'), 
    (0.90000000000000002, 'mike'), 
    (1.8999999999999999, "I'm fine too")]), 
    ('B', [(1.23, 'fine'), (1.5, 'thanks'), (1.6000000000000001, 'and you')]), 
    ('C', 
    [(2.1200000000000001, 'good'), 
    (2.2400000000000002, 'morning'), 
    (3.1299999999999999, 'guys')])] 

轉換(和打印):

newlist = [] 
for alpha, tuplelist in l: 
    for tup in tuplelist: 
     newlist.append((alpha,tup)) 

from operator import itemgetter 
sorted(newlist,key=itemgetter(1)) 
print newlist 

檢查!

[('A', (0.12, 'how')), 
('A', (0.26000000000000001, 'are')), 
('A', (0.69999999999999996, 'you')), 
('A', (0.90000000000000002, 'mike')), 
('B', (1.23, 'fine')), 
('B', (1.5, 'thanks')), 
('B', (1.6000000000000001, 'and you')), 
('A', (1.8999999999999999, "I'm fine too")), 
('C', (2.1200000000000001, 'good')), 
('C', (2.2400000000000002, 'morning')), 
('C', (3.1299999999999999, 'guys'))] 

當然你可以做到這一點的列表理解中,但你仍然可以使用2個for環和1種內置sorted功能。當然也可以讓它冗長而且可讀。