從列數統計的Pandas直方圖

我有一個大的數據框，它包含大約6500列，其中一個是類標籤，其餘的是布爾值0或1，數據框很稀疏。從列數統計的Pandas直方圖

例如：

df = pd.DataFrame({ 
      'label' : ['a', 'b', 'c', 'b','a', 'c', 'b', 'a'], 
      'x1' : np.random.choice(2, 8), 
      'x2' : np.random.choice(2, 8), 
      'x3' : np.random.choice(2, 8)})

我要的是一個報告（最好在熊貓這樣我就可以輕鬆地繪製它），它顯示我的標籤分組列的獨特元素的總和。

因此，例如，該數據幀：

x1 x2 x3 label 
0 0 1 1 a 
1 1 0 1 b 
2 0 1 0 c 
3 1 0 0 b 
4 1 1 1 a 
5 0 0 1 c 
6 1 0 0 b 
7 0 1 0 a

結果應該是這樣的：

a: 3 (since it has x1, x2 and x3) 
b: 2 (since it has x1, x3) 
c: 2 (since it has x2, x3)

所以它是一種計數，其中列存在於每個標籤。考慮一個直方圖，其中x軸是label，y軸是number of columns。

來源

2015-10-15 Tim

你可以試着轉動：

import pandas as pd 
import numpy as np 

df = pd.DataFrame({ 
     'label' : ['a', 'b', 'c', 'b','a', 'c', 'b', 'a'], 
     'x1' : np.random.choice(2, 8), 
     'x2' : np.random.choice(2, 8), 
     'x3' : np.random.choice(2, 8)}) 

pd.pivot_table(df, index='label').transpose().apply(np.count_nonzero)

對於DF：

label x1 x2 x3 
0 a 0 0 0 
1 b 0 1 0 
2 c 1 0 1 
3 b 0 1 0 
4 a 1 1 1 
5 c 1 0 1 
6 b 0 1 0 
7 a 1 1 1

結果是：

label 
a 3 
b 1 
c 2 
dtype: int64

來源

2015-10-15 09:25:06

真棒解決方案。愛它堅持熊貓/ numpy環境，它也是非常快。謝謝你給我洞察樞軸。從未使用過 – Tim

想一想，您可以刪除轉置，並在apply中使用axis = 1。很高興我能幫上忙。 –

label = df.groupby('label') 
for key,val in label.count()['x1'].iteritems(): 
    strg = '%s:%s' %(key,val) 
    for col,vl in label.sum().ix[key].iteritems(): 
     if vl!=0: 
      strg += ' %s'%col 
    print strg

來源

2015-10-15 09:11:42

從列數統計的Pandas直方圖

回答

相關問題