熊貓使得它可以很容易閱讀的CSV文件:閱讀鍵 - 值對成熊貓
pd.read_table('data.txt', sep=',')
大熊貓是否具有用於與鍵值對的文件類似的東西?我想出了這個:
pd.DataFrame([dict([p.split('=') for p in l.split(',')]) for l in open('data.txt')])
如果不是內置的,那麼也許更習慣?
感興趣的文件是這樣的:
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525690751,price=1548.00,quantity=551
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525697183,price=1548.00,quantity=551
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525714498,price=1548.00,quantity=551
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525734967,price=1548.00,quantity=551
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525735567,price=1548.00,quantity=555
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525735585,price=1548.00,quantity=556
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525736116,price=1548.00,quantity=556
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525740757,price=1548.00,quantity=556
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525748502,price=1548.00,quantity=556
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525748952,price=1548.00,quantity=557
它在每一行完全相同的鑰匙,並以相同的順序。沒有空值。要生成的表是:
exchange price quantity symbol timestamp
0 GLOBEX 1548.00 551\n ESM3 1365428525690751
1 GLOBEX 1548.00 551\n ESM3 1365428525697183
2 GLOBEX 1548.00 551\n ESM3 1365428525714498
3 GLOBEX 1548.00 551\n ESM3 1365428525734967
4 GLOBEX 1548.00 555\n ESM3 1365428525735567
5 GLOBEX 1548.00 556\n ESM3 1365428525735585
6 GLOBEX 1548.00 556\n ESM3 1365428525736116
7 GLOBEX 1548.00 556\n ESM3 1365428525740757
8 GLOBEX 1548.00 556\n ESM3 1365428525748502
9 GLOBEX 1548.00 557\n ESM3 1365428525748952
(我可以從quantity
與rstrip()
刪除\n
我已經帶來了後話)
您能舉一個例子,說明該文件的外觀以及您希望DataFrame使用何種格式? – DSM 2013-04-09 16:54:17
@DSM我已經添加了一個示例。 – chrisaycock 2013-04-09 17:04:18