2017-04-05 130 views
0

首次進口大熊貓創造完美的正態分佈系列:大熊貓如何計算sem()?

import pandas as pd 

lst = [[5 for x in range(5)], [4 for x in range(4)], [3 for x in range(3)], 
     [2 for x in range(2)], [1 for x in range(1)], [2 for x in range(2)], 
     [3 for x in range(3)], [4 for x in range(4)], [5 for x in range(5)]] 

lst = [item for sublists in lst for item in sublists] 

series = pd.Series(lst) 

讓我們來看看,這種分佈是正常的:

print(round(sum(series - series.mean())/series.count(), 1) == 0) 
# if distribution is normal we'll see True 

現在,讓我們打印SEM()的宇宙:

print(series.sem(ddof=0)) 
# 0.21619987017 

現在供樣品:

print(series.sem()) # ddof=1 
# 0.220026713637 

但我不明白大熊貓如何計算平均值的標準誤差,如果它與宇宙一起工作。是否使用

se_x = sd_x/sqrt(len(x)) 

或創建樣品?如果它創建樣本,我可以設置多少個以及如何設置它們?

熊貓如何計算樣本的sem如果計數< 30?

回答

1

Pandas generates sem method dynamically

cls.sem = _make_stat_function_ddof(
     cls, 'sem', name, name2, axis_descr, 
     "Return unbiased standard error of the mean over requested " 
     "axis.\n\nNormalized by N-1 by default. This can be changed " 
     "using the ddof argument", 
     nanops.nansem) 

where nanops.nansem() is

@disallow('M8', 'm8') 
def nansem(values, axis=None, skipna=True, ddof=1): 
    var = nanvar(values, axis, skipna, ddof=ddof) 

    mask = isnull(values) 
    if not is_float_dtype(values.dtype): 
     values = values.astype('f8') 
    count, _ = _get_counts_nanvar(mask, axis, ddof, values.dtype) 
    var = nanvar(values, axis, skipna, ddof=ddof) 

    return np.sqrt(var)/np.sqrt(count) 

您可能還需要檢查可用的方法在scipy.stats模塊