的差異df.mean（）和df [「列」]的結果平均（

我運行只以下三行：的差異df.mean（）和df [「列」]的結果平均（

df = pd.read_hdf('data.h5') 
print(df.mean()) 
print(df['derived_3'].mean())

第一print列出了所有的個體的裝置，用於每柱，這些是

derived_3  -5.046012e-01

第二print人給單獨只此列的平均值，並給出結果

-0.504715

儘管使用科學記數法有所不同，但這些數值不同 - 爲什麼這樣呢？

例子使用其他方法

執行與sum()結果相同，如下所示：

derived_3  -7.878262e+05 

-788004.0

再次，略有不同的結果，但count()返回相同的結果：

derived_3   1561285 

1561285

而且，df.head()結果：

id timestamp derived_0 derived_1 derived_2 derived_3 derived_4 \ 
0 10   0 0.370326 -0.006316 0.222831 -0.213030 0.729277 
1 11   0 0.014765 -0.038064 -0.017425 0.320652 -0.034134 
2 12   0 -0.010622 -0.050577 3.379575 -0.157525 -0.068550 
3 25   0  NaN  NaN  NaN  NaN  NaN 
4 26   0 0.176693 -0.025284 -0.057680 0.015100 0.180894 

    fundamental_0 fundamental_1 fundamental_2 ...  technical_36 \ 
0  -0.335633  0.113292  1.621238 ...   0.775208 
1  0.004413  0.114285  -0.210185 ...   0.025590 
2  -0.155937  1.219439  -0.764516 ...   0.151881 
3  0.178495   NaN  -0.007262 ...   1.035936 
4  0.139445  -0.125687  -0.018707 ...   0.630232 

    technical_37 technical_38 technical_39 technical_40 technical_41 \ 
0   NaN   NaN   NaN  -0.414776   NaN 
1   NaN   NaN   NaN  -0.273607   NaN 
2   NaN   NaN   NaN  -0.175710   NaN 
3   NaN   NaN   NaN  -0.211506   NaN 
4   NaN   NaN   NaN  -0.001957   NaN 

    technical_42 technical_43 technical_44   y 
0   NaN   -2.0   NaN -0.011753 
1   NaN   -2.0   NaN -0.001240 
2   NaN   -2.0   NaN -0.020940 
3   NaN   -2.0   NaN -0.015959 
4   NaN   0.0   NaN -0.007338

來源

2017-10-04 KOB

此外，添加'df.dtypes'？ – Zero

加入我的帖子。這是一個非常大的文件，據我所知，一些數字有20個小數位，這些數字沒有顯示在熊貓的結果中。這可能會導致問題嗎？ – KOB

也許，請參閱https://stackoverflow.com/questions/22107928/numpy-sum-is-not-giving-right-answer-for-float32-type和https://stackoverflow.com/questions/41705764/numpy -sum-giving-strange-results-on-large-arrays – Zero

pd.DataFrame方法與pd.Series方法

在df.mean()，mean是pd.DataFrame.mean和所有列上的作爲單獨pd.Series操作。返回的是pd.Series，其中df.columns是新索引，每列的平均值是值。在你的第一個例子中，df只有一列，所以結果是一個系列的長度，其中索引是該列的名稱，該值是該列的平均值。

在df['derived_3'].mean()，mean是pd.Series.mean和df['derived_3']是pd.Series。 pd.Series.mean的結果將是一個標量。

顯示差異

在顯示的差異是由於df.mean結果是pd.Series和浮子格式由pandas控制。另一方面，df['derived_3'].mean()是python的基元，並不受熊貓的控制。

import numpy as np 
import pandas as pd

標量

np.pi 

3.141592653589793

pd.Series

pd.Series(np.pi) 

0 3.141593 
dtype: float64

具有不同格式

with pd.option_context('display.float_format', '{:0.15f}'.format): 
    print(pd.Series(np.pi)) 

0 3.141592653589793 
dtype: float64

減少
這是考慮這些不同的方法爲要麼減少維度或沒有用處。或者是同義詞，聚合或轉換。

減少pd.DataFrame導致pd.Series
減少pd.Series導致標

方法減少

mean
sum
std

來源

2017-10-04 19:36:50 piRSquared

我明白了。當你說「顯示差異」時，你的意思是這兩種計算方式實際上是完全正確的，只是顯示方式不同，或者如果我在執行計算時互換了兩個示例，這是否會實際影響我的結果？ – KOB

他們完全一樣。 '3.14159265359'和'pd.Series（3.14159265359）'裏面的值是一樣的。 – piRSquared

@piRSquared還有一個關於這個的問題 - 我有這個操作'df.ix [:, 2：-1] = df.ix [:, 2：-1] - df.ix [:, 2：-1]。 mean（）'，我期望對所有索引列進行歸一化，以使它們的平均值爲0.當我在執行此操作後打印出平均值時，它們都顯示爲非常小的數字，但不完全爲0.是無論如何，我可以檢查我的方程是否正確，並且這些值實際上是零，或者我的方程是否錯誤，如果它們顯示爲0？ – KOB

的差異df.mean（）和df [「列」]的結果平均（

回答

相關問題