2
我在Python 2.7中有以下Pandas Dataframe。熊貓標準偏差返回NaN
CODE:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(10,6),columns=list('ABCDEF'))
df.insert(0,'Category',['A','C','D','D','B','E','F','F','G','H'])
print df.groupby('Category').std()
這裏是df
:
Category A B C D E F
A 0.500200 0.791039 0.498083 0.360320 0.965992 0.537068
C 0.295330 0.638823 0.133570 0.272600 0.647285 0.737942
D 0.912966 0.051288 0.055766 0.906490 0.078384 0.928538
D 0.416582 0.441684 0.605967 0.516580 0.458814 0.823692
B 0.714371 0.636975 0.153347 0.936872 0.000649 0.692558
E 0.639271 0.486151 0.860172 0.870838 0.831571 0.404813
F 0.375279 0.555228 0.020599 0.120947 0.896505 0.424233
F 0.952112 0.299520 0.150623 0.341139 0.186734 0.807519
G 0.384157 0.858391 0.278563 0.677627 0.998458 0.829019
H 0.109465 0.085861 0.440557 0.925500 0.767791 0.626924
我期待執行GROUP_BY
,然後計算平均值和標準偏差。標準偏差是有時分組後計算1行 - 這意味着除以N-1
將有時給予除以0
這將打印NaN
。
這裏是上面的代碼的輸出:
OUTPUT:
A B C D E F
Category
A NaN NaN NaN NaN NaN NaN
B NaN NaN NaN NaN NaN NaN
C NaN NaN NaN NaN NaN NaN
D 0.350996 0.276052 0.389051 0.275708 0.269004 0.074137
E NaN NaN NaN NaN NaN NaN
F 0.407882 0.180813 0.091941 0.155699 0.501884 0.271025
G NaN NaN NaN NaN NaN NaN
H NaN NaN NaN NaN NaN NaN
對於我在哪裏執行GROUP_BY
超過1行的情況下,有一個方法來跳過標準偏差只是返回值本身。例如,我希望得到這樣的:
所需的輸出
A B C D E F
Category
A 0.500200 0.791039 0.498083 0.360320 0.965992 0.537068
B 0.714371 0.636975 0.153347 0.936872 0.000649 0.692558
C 0.295330 0.638823 0.133570 0.272600 0.647285 0.737942
D 0.350996 0.276052 0.389051 0.275708 0.269004 0.074137
E 0.639271 0.486151 0.860172 0.870838 0.831571 0.404813
F 0.407882 0.180813 0.091941 0.155699 0.501884 0.271025
G 0.384157 0.858391 0.278563 0.677627 0.998458 0.829019
H 0.109465 0.085861 0.440557 0.925500 0.767791 0.626924
是否有可能與大熊貓做到這一點?
編輯: 要創建精確的熊貓據幀以上,選擇它,複製到剪貼板,然後使用此:
import pandas as pd
df = pd.read_clipboard()
print df
print df.groupby('Category').std()
您是如何計算您所需的輸出中的類別D和F的值(例如,類別D的A列爲0.403709)? – Alexander
請注意,'df.groupby('Category')。apply(np.std)'將std返回爲'0.0',正如您所期望的那樣。 – dhrumeel