轉換三米欄的文本文件，以矩陣

您好我想轉換一個文件，該文件是製表符分隔的，看起來像這樣：轉換三米欄的文本文件，以矩陣

Species Date Data 
1  Dec 3 
2  Jan 4 
2  Dec 6 
2  Dec 3

到這樣一個矩陣（種行頭）：

1 2 
Dec 3 9 
Jan 4

我猜測解決方案的一部分是創建一個帶有兩個鍵的字典，並使用defaultdict將新值添加到密鑰對。我想吐出這個出來的製表符分隔的形式，但也進入格式，以便我可以使用scipy的集羣部分。

來源

2010-07-17 PROhan

[創建從簡單的三列文本FIE矩陣]的可能重複（http://stackoverflow.com/questions/3263759/creating-a-matrix-from-simple-three-column-text-fie ） – tom10 2010-07-18 13:55:44

pandas庫中的DataFrame對象使得它非常簡單。

import csv 
from collections import defaultdict 
from pandas import DataFrame 

rdr = csv.reader(open('mat.txt'), delimiter=' ', skipinitialspace=True) 
datacols = defaultdict(list) 

# skip header 
rdr.next() 
for spec, dat, num in rdr: 
    datacols['species'].append(int(spec)) 
    datacols['dates'].append(dat) 
    datacols['data'].append(int(num)) 

df = DataFrame(datacols) 
df2 = df.pivot(index='dates', columns='species', values='data')

首先我們按照您提供的格式從文件中讀取數據。然後構建一個列的字典（datacol），因爲這是熊貓的DataFrame想要的。一旦構建了DataFrame（df），然後調用它的透視方法以獲得所需的格式。以下是df和df2看起來像在控制檯：

In [205]: df 
Out[205]: 
    data   dates   species 
0 3    Dec   1 
1 4    Jan   2 
2 6    Dec   2 
3 3    Dec   2 


In [206]: df2 
Out[206]: 
     1    2 
Dec 3    3 
Jan NaN   4

然後，您可以使用toCSV方法將其保存到一個文件中（請參閱前面的鏈接數據框文檔）。

來源

2010-07-17 02:14:49 ars

我不知道numpy，所以我也只能是局部的幫助，但我發現寫這個小片段有趣，所以這裏是defaultdict：

# we'll pretend *f* is a file below 
f = '''Species Date Data 
1  Dec 3 
2  Jan 4 
2  Dec 6 
2  Dec 3'''.split('\n')[1:] 

from collections import defaultdict 

d = defaultdict(int) 
for ln in f: 
    x,y,n = ln.split() 
    d[x,y] += int(n) 

# transpose the list of tuples (keys) to get the two dimensions, remove the duplicates 
x,y = map(set, zip(*d)) 

print list(x) 
for yy in y: 
    print yy, [d[xx,yy] for xx in x]

和運行結果，這是

['1', '2'] 
Jan [0, 4] 
Dec [3, 9]

可愛，不是嗎？

來源

2010-07-17 04:38:05

隨着熊貓的直截了當。你可以使用read_table（）來讀你的文本文件，但我已經手動創建了下面的數據框。

from pandas import DataFrame  
#create the data frame 
df = DataFrame({'Species' : [1,2,2,2], 
    'Date' : ['Dec','Jan', 'Dec', 'Dec'], 
    'Data' : [3,4,6,3]}) 

#group by the Date and Species columns, and take the sume of the Data column 
df2 = df.groupby(['Date','Species'])['Data'].sum() 

# unstack the Species Column to reshape your data 
df2.unstack('Species')

來源

2012-12-06 03:26:48 zach

轉換三米欄的文本文件，以矩陣

回答

相關問題