2
如果在數據集缺失值上使用scipy.mstats.theilslopes例程,則斜率估計的下限和上限結果不正確。上限通常是/總是(?)NaN,而下限是完全錯誤的。發生這種情況的原因是,theilslopes例程計算排序後的斜率數組中的索引,並且該數組包含缺少值的斜率。scipy.mstats.theilslopes如果數據缺失值,則置信度限制錯誤
解決方法是在分析之前刪除缺失的值,但這並未記錄在案。
爲了說明問題,這裏是一個簡單的代碼片段: 進口numpy的是NP 從scipy.stats導入mstats
x = np.arange(12)
y = np.array([28.9, 26.2, 27.2, 26.5, 28.4, 25.3, 26.1, 24.8, 27.7,
np.nan, np.nan, 29.6])
slope, intercept, lo_slope, up_slope = mstats.theilslopes(y, x,
alpha=0.1)
print "incorrect: ", slope, lo_slope, up_slope
idx = [0, 1, 2, 3, 4, 5, 6, 7, 8, 11]
x = x[idx] # equivalent to pandas series.dropna()
y = y[idx]
slope, intercept, lo_slope, up_slope = mstats.theilslopes(y, x,
alpha=0.1)
print "correct: ", slope, lo_slope, up_slope