這是一個開始。我不是在這一點上(IPython的與python3.4)擔心速度
In [473]: dd = {'Clinton': [{'ideology': -0.5, 'vote':80}, {'ideology': -0.75, 'vote':90},
{'ideology': -0.89, 'vote': 99},
{'ideology': -0.5, 'vote':80, 'review': "She is a presidential candidate"}],
'Alexander': [{'ideology': -0.1, 'vote':50}, {'ideology': -0.95, 'vote':20},
{'ideology': -0.19, 'vote': 19}, {'ideology': -0.2, 'vote':30, 'review': "Good"}]}
...
In [475]: dd
Out[475]:
{'Alexander': [{'ideology': -0.1, 'vote': 50},
{'ideology': -0.95, 'vote': 20},
{'ideology': -0.19, 'vote': 19},
{'ideology': -0.2, 'vote': 30, 'review': 'Good'}],
'Clinton': [{'ideology': -0.5, 'vote': 80},
{'ideology': -0.75, 'vote': 90},
{'ideology': -0.89, 'vote': 99},
{'ideology': -0.5, 'vote': 80, 'review': 'She is a presidential candidate'}]}
In [476]: dd.keys()
Out[476]: dict_keys(['Alexander', 'Clinton'])
In [478]: dd.values()
Out[478]: dict_values([[{'ideology': -0.1, 'vote': 50}, {'ideology': -0.95, 'vote': 20}, {'ideology':....}]])
...
做一個記錄數組我需要一個元組列表,每個每個字段的值。具有鍵值對的第一個記錄。但價值是一個清單。
(這些值列表顯然是使用默認字典,列表追加的結果,它是建立一個字典的一個很好的方式,但不幸的是,對於數組我們必須將它解開。)
In [480]: [(k,v) for k,v in dd.items()]
Out[480]:
[('Alexander',
[{'ideology': -0.1, 'vote': 50},
{'ideology': -0.95, 'vote': 20},
....
'review': 'She is a presidential candidate'}])]
- 更好地與3個字段元組的列表的列表:
In [483]: [[(k,vv['ideology'],vv['vote']) for vv in v] for k,v in dd.items()]
Out[483]:
[[('Alexander', -0.1, 50),
('Alexander', -0.95, 20),
('Alexander', -0.19, 19),
('Alexander', -0.2, 30)],
[('Clinton', -0.5, 80),
('Clinton', -0.75, 90),
('Clinton', -0.89, 99),
('Clinton', -0.5, 80)]]
添加可能缺少review
場
In [484]: [[(k,vv['ideology'],vv['vote'],vv.get('review','')) for vv in v] for k,v in dd.items()]
Out[484]:
[[('Alexander', -0.1, 50, ''),
('Alexander', -0.95, 20, ''),
('Alexander', -0.19, 19, ''),
('Alexander', -0.2, 30, 'Good')],
[('Clinton', -0.5, 80, ''),
('Clinton', -0.75, 90, ''),
('Clinton', -0.89, 99, ''),
('Clinton', -0.5, 80, 'She is a presidential candidate')]]
In [485]: ll=[[(k,vv['ideology'],vv['vote'],vv.get('review','')) for vv in v] for k,v in dd.items()]
要拼合名單列表中,使用intertools鏈
In [486]: from itertools import chain
...
In [488]: list(chain(*ll))
Out[488]:
[('Alexander', -0.1, 50, ''),
('Alexander', -0.95, 20, ''),
('Alexander', -0.19, 19, ''),
('Alexander', -0.2, 30, 'Good'),
('Clinton', -0.5, 80, ''),
('Clinton', -0.75, 90, ''),
('Clinton', -0.89, 99, ''),
('Clinton', -0.5, 80, 'She is a presidential candidate')]
In [489]: ll1=list(chain(*ll))
...
定義一個D型:
In [491]: dt=np.dtype([('name','U10'),('ideology',float),('vote',int),('review','U100')])
In [492]: data=np.array(ll1,dt)
In [493]: data
Out[493]:
array([('Alexander', -0.1, 50, ''), ('Alexander', -0.95, 20, ''),
('Alexander', -0.19, 19, ''), ('Alexander', -0.2, 30, 'Good'),
('Clinton', -0.5, 80, ''), ('Clinton', -0.75, 90, ''),
('Clinton', -0.89, 99, ''),
('Clinton', -0.5, 80, 'She is a presidential candidate')],
dtype=[('name', '<U10'), ('ideology', '<f8'), ('vote', '<i4'), ('review', '<U100')])
看起來不錯。在最後一個陣列創建步驟中沒有迭代。將字典轉換爲元組列表時有一個迭代。但使用字典時,這種迭代是不可避免的。
它的價值:大熊貓可以很容易地從字典中創建一個DataFrame。 – Evert
但熊貓除外:你是否嘗試過創建一個空的結構化數組,並在字典和內部列表上使用循環來填充數組? – Evert
@Evert我試過了,但問題是數據有超過百萬的觀測值。因此循環播放將需要一段時間。我想使用字典作爲嶺迴歸的特徵向量! – user3077008