2016-11-11 65 views
1

我有形式,DF的熊貓數據幀=條件計數

index,result1,result2,result3 
    0  s  u  s  
    1  u  s  u 
    2  s      
    3  s  s  u 

我想添加其S發生的次數的列表的另一列該行,例如

index,result1,result2,result3,count 
    0  s  u  s  2 
    1  u  s  u  1 
    2  s      1 
    3  s  s  u  2 

我曾嘗試下面的代碼

col=['result1','result2','result3'] 
df[cols].count(axis=1) 

但這回小號

0,3 
1,3 
2,1 
3,3 

所以這個計數元件的數量,然後我試圖

df[df[cols]=='s'].count(axis=1) 

但這返回以下錯誤: 「無法比較[ 'S']與塊值」

任何幫助將不勝感激

+0

什麼這裏有缺失值?他們是空白的字符串還是「NaN」? 'df.info()'顯示什麼?如果你有所有str或者混合dtypes,那麼'df =='s''將會起作用,但是如果你有任何純數字列或者行,那麼這將不起作用,如果你有任何'NaN'行,就會發生這種情況。你可以試試'df.fillna('',inplace = True)'then'(df [cols] ='s')。count(axis = 1)'should do – EdChum

+0

@WGP,'df ['count'] = (df [cols] .values =='s')。sum(1)'會是一個不錯的選擇? –

回答

1

對於我的作品投到stringastype數字和NaN列返回您的error

print (df) 
    index result1 result2 result3 result4 
0  0  s  u  7  NaN 
1  1  u  s  7  NaN 
2  2  s  NaN  8  NaN 
3  3  s  s  7  NaN 
4  4  NaN  NaN  2  NaN 

print (df.dtypes) 
index  int64 
result1  object 
result2  object 
result3  int64 
result4 float64 
dtype: object 

cols = ['result1','result2','result3','result4'] 
df['count'] = df[df[cols].astype(str) == 's'].count(axis=1) 
print (df) 
    index result1 result2 result3 result4 count 
0  0  s  u  7  NaN  1 
1  1  u  s  7  NaN  1 
2  2  s  NaN  8  NaN  1 
3  3  s  s  7  NaN  2 
4  4  NaN  NaN  2  NaN  0 

或者sumTrueboolean mask值:

print (df[cols].astype(str) == 's') 

    result1 result2 result3 result4 
0 True False False False 
1 False True False False 
2 True False False False 
3 True True False False 
4 False False False False 

cols = ['result1','result2','result3','result4'] 
df['count'] = (df[cols].astype(str) =='s').sum(axis=1) 
print (df) 
    index result1 result2 result3 result4 count 
0  0  s  u  7  NaN  1 
1  1  u  s  7  NaN  1 
2  2  s  NaN  8  NaN  1 
3  3  s  s  7  NaN  2 
4  4  NaN  NaN  2  NaN  0 

另外一個不錯的解決方案是從Nickil Maveli - 使用numpy

df['count'] = (df[cols].values=='s').sum(axis=1)