2017-02-14 61 views
1

我有這個數據幀df如何統計一系列列出現在數據框中的次數?

AA_0 AA_1  AA_2  AA_3 
store cake  mass  visit  
store mass  visit 
mass store 
store cake  mass  visit 

我想計算的每個時間每個序列AA_0數 - AA_3出現在df,並表示結果如下:

result = 

    count data 
    2  store/cake/mass/visit 
    1  store/mass/visit 
    1  mass/store 

我怎麼能做它?

回答

2

您可以使用:

df['data'] = df.apply(lambda x: '/'.join(x.dropna()), axis=1) 
print (df) 
    AA_0 AA_1 AA_2 AA_3     data 
0 store cake mass visit store/cake/mass/visit 
1 store mass visit NaN  store/mass/visit 
2 mass store NaN NaN    mass/store 
3 store cake mass visit store/cake/mass/visit 

result = df.data.value_counts().rename_axis('count').reset_index() 
print (result) 
        count data 
0 store/cake/mass/visit  2 
1  store/mass/visit  1 
2    mass/store  1 

如果缺少數據空間:

df['data'] = df.apply(lambda x: '/'.join(x), axis=1).str.strip('/ ') 
print (df) 
    AA_0 AA_1 AA_2 AA_3     data 
0 store cake mass visit store/cake/mass/visit 
1 store mass visit    store/mass/visit 
2 mass store       mass/store 
3 store cake mass visit store/cake/mass/visit 

result = df.data.value_counts().rename_axis('count').reset_index() 
print (result) 
        count data 
0 store/cake/mass/visit  2 
1  store/mass/visit  1 
2    mass/store  1 
相關問題