2016-01-10 124 views
1

我有一個這樣的大數據集,我試圖做一個字典的數據框的字典組織犯罪與其他列的頻率。創建一個詞典的頻率字典從數據幀

train_data

23 Wednesday BAYVIEW CENTRAL INGLESIDE NORTHERN PARK RICHMOND crime 
0 1   1  0  0   0   1  0   0  3 
1 1   1  0  0   0   1  0   0  1 
2 1   1  0  0   0   1  0   0  1 
3 1   1  0  0   0   1  0   0  0 
4 1   1  0  0   0   0  1   0  0 
5 1   1  0  0   1   0  0   0  0 
6 1   1  0  0   1   0  0   0  2 
7 1   1  1  0   0   0  0   0  2 
8 1   1  0  0   0   0  0   1  0 
9 1   1  0  1   0   0  0   0  0 

所以我決定首先用「罪行」的列GROUPBY數據框:

train_data=train_data.groupby(['crime']).sum() 


     23 Wednesday BAYVIEW CENTRAL INGLESIDE NORTHERN PARK RICHMOND 
crime                  
0  5   5  0  1   1   1  1   1 
1  2   2  0  0   0   2  0   0 
2  2   2  1  0   1   0  0   0 
3  1   1  0  0   0   1  0   0 

然後我試圖組織他們在詞典的詞典,但我無法做到這一點,我嘗試了一些迭代,但數據框有問題。

結果應該是這樣的:

{0: {23: 5, Wednesday: 1, BAYVIEW: 0, CENTRAL: 1, ...}, 
1: {23: 2, Wednesday: 2, BAYVIEW: 0, ...}, 
2: {...}, 3: {...}} 

回答

0

如果你對大熊貓0.17.0或更新版本或更高版本的MaxNoe發佈:

train_data.groupby('crime').sum().to_dict(orient='index') 

否則:

train_data.groupby('crime').sum().T.to_dict()