2016-09-12 169 views
-2

我有熊貓df同列,T max & T min。我想在下一欄中計算T mean。我做了這個df['T mean']= df[['T max','T min']].mean(axis=1)但沒有解決。我得到T maxT mean。有人能幫助我嗎?如何計算熊貓數據框中的行平均值?

+2

請提供樣品數據幀的工作。 –

+3

發佈原始數據,您的代碼,期望的輸出和您的錯誤輸出 – EdChum

回答

1

我認爲存在問題的列T min - type的值是string,而不是數字。因此,你需要通過astype投它:

樣品:

df=pd.DataFrame({'T max':[1,2,3],'T min':['5','6','7']}) 
print (df) 
    T max T min 
0  1  5 
1  2  6 
2  3  7 

print (type(df.ix[0,'T min'])) 
<class 'str'> 

df['T mean']= df[['T max','T min']].mean(axis=1) 
print (df) 
    T max T min T mean 
0  1  5  1.0 
1  2  6  2.0 
2  3  7  3.0 

#cast column to int 
df['T min'] = df['T min'].astype(int) 

print (type(df.ix[0,'T min'])) 
<class 'numpy.int32'> 

df['T mean new']= df[['T max','T min']].mean(axis=1) 
print (df) 
    T max T min T mean T mean new 
0  1  5  1.0   3.0 
1  2  6  2.0   4.0 
2  3  7  3.0   5.0 

如果astype返回錯誤:

ValueError: invalid literal for int() with base 10: 'aaa'

這意味着在T min列至少一個無效值。

樣品:

df=pd.DataFrame({'T max':[1,2,3],'T min':['5','6','aaa']}) 
print (df) 
    T max T min 
0  1  5 
1  2  6 
2  3 aaa 

df['T mean']= df[['T max','T min']].mean(axis=1) 
print (df) 
    T max T min T mean 
0  1  5  1.0 
1  2  6  2.0 
2  3 aaa  3.0 

#check invalid rows where is bad value in T min 
print (df[ pd.to_numeric(df['T min'], errors='coerce').isnull()]) 
    T max T min T mean 
2  3 aaa  3.0 

#replace invlid value to NaN 
df['T min'] = pd.to_numeric(df['T min'], errors='coerce') 

df['T mean new']= df[['T max','T min']].mean(axis=1) 
print (df) 
    T max T min T mean T mean new 
0  1 5.0  1.0   3.0 
1  2 6.0  2.0   4.0 
2  3 NaN  3.0   3.0 
+0

我將列投射到int並且它工作。謝謝 ! –