2013-08-05 64 views
0

我在mysql中,在那裏我有每個項目的出現(CNT列)爲每個不同的ID此表:從MySQL表Python列表

ID  ITEM  CNT 
--------------------- 
01  093  4 
01  129F  2 
01  AB56  0 
01  BB44  0 
01  XH7  0 
01  TYE2  1 
02  093  0 
02  129F  3 
02  AB56  1 
02  BB44  0 
02  XH7  2 
02  TYE2  2 
03  093  9 
03  129F  2 
03  AB56  0 
03  BB44  1 
03  XH7  4 
03  TYE2  0 
...... 

我想找到導入的有效途徑我負責

[[4,2,0,0,0,1],[0,3,1,0,2,2],[9,2,0,1,4,0]] 

其中每個列表代表一個ID ... :從MySQL到Python這樣我就可以把它們作爲項目計數向量的聚類程序,在列表的列表的形式,這個數據與大量的數據(數百萬行),因此性能是一個問題.. 任何幫助將不勝感激

回答

1

使用itertools.groupby

... 
cursor.execute('SELECT ID, CNT FROM table_name ORDER BY ID') 
item_count_vector = [ 
    [cnt for id_, cnt in grp] 
    for key, grp in itertools.groupby(cursor.fetchall(), key=lambda row: row[0]) 
] 

OR(如果你使用DictCursor狀光標)

item_count_vector = [ 
    [d['CNT'] for d in grp] 
    for key, grp in itertools.groupby(cursor.fetchall(), key=lambda row: row['ID']) 
] 

... 

>>> import itertools 
>>> # Assume following rows are retrieved from DB using cursor.fetchall() 
>>> rows = (
...  ('01',4), 
...  ('01',2), 
...  ('01',0), 
...  ('01',0), 
...  ('01',0), 
...  ('01',1), 
...  ('02',0), 
...  ('02',3), 
...  ('02',1), 
...  ('02',0), 
...  ('02',2), 
...  ('02',2), 
...  ('03',9), 
...  ('03',2), 
...  ('03',0), 
...  ('03',1), 
...  ('03',4), 
...  ('03',0), 
...) 
>>> [[cnt for id_, cnt in grp] for key, grp in itertools.groupby(rows, key=lambda row: row[0])] 
[[4, 2, 0, 0, 0, 1], [0, 3, 1, 0, 2, 2], [9, 2, 0, 1, 4, 0]] 
+0

我得到一個KeyError異常:0。 ..它是因爲獲取都以字典的形式返回它? {'CNT':4L,'ID':01L},{'CNT':2L,'ID':01L} ... – user2578185

+0

@ user2578185,我添加了另一個代碼。請參閱**或**部分。 – falsetru

+0

它現在的作品,謝謝 – user2578185