我運行只以下三行:的差異df.mean()和df [「列」]的結果平均(
df = pd.read_hdf('data.h5')
print(df.mean())
print(df['derived_3'].mean())
第一print
列出了所有的個體的裝置,用於每柱,這些是
derived_3 -5.046012e-01
第二print
人給單獨只此列的平均值,並給出結果
-0.504715
儘管使用科學記數法有所不同,但這些數值不同 - 爲什麼這樣呢?
例子使用其他方法
執行與sum()
結果相同,如下所示:
derived_3 -7.878262e+05
-788004.0
再次,略有不同的結果,但count()
返回相同的結果:
derived_3 1561285
1561285
而且,df.head()
結果:
id timestamp derived_0 derived_1 derived_2 derived_3 derived_4 \
0 10 0 0.370326 -0.006316 0.222831 -0.213030 0.729277
1 11 0 0.014765 -0.038064 -0.017425 0.320652 -0.034134
2 12 0 -0.010622 -0.050577 3.379575 -0.157525 -0.068550
3 25 0 NaN NaN NaN NaN NaN
4 26 0 0.176693 -0.025284 -0.057680 0.015100 0.180894
fundamental_0 fundamental_1 fundamental_2 ... technical_36 \
0 -0.335633 0.113292 1.621238 ... 0.775208
1 0.004413 0.114285 -0.210185 ... 0.025590
2 -0.155937 1.219439 -0.764516 ... 0.151881
3 0.178495 NaN -0.007262 ... 1.035936
4 0.139445 -0.125687 -0.018707 ... 0.630232
technical_37 technical_38 technical_39 technical_40 technical_41 \
0 NaN NaN NaN -0.414776 NaN
1 NaN NaN NaN -0.273607 NaN
2 NaN NaN NaN -0.175710 NaN
3 NaN NaN NaN -0.211506 NaN
4 NaN NaN NaN -0.001957 NaN
technical_42 technical_43 technical_44 y
0 NaN -2.0 NaN -0.011753
1 NaN -2.0 NaN -0.001240
2 NaN -2.0 NaN -0.020940
3 NaN -2.0 NaN -0.015959
4 NaN 0.0 NaN -0.007338
此外,添加'df.dtypes'? – Zero
加入我的帖子。這是一個非常大的文件,據我所知,一些數字有20個小數位,這些數字沒有顯示在熊貓的結果中。這可能會導致問題嗎? – KOB
也許,請參閱https://stackoverflow.com/questions/22107928/numpy-sum-is-not-giving-right-answer-for-float32-type和https://stackoverflow.com/questions/41705764/numpy -sum-giving-strange-results-on-large-arrays – Zero