比較樣本平均值VS隨機花色品種在Python

鑑於DF比較樣本平均值VS隨機花色品種在Python

  A   B   C 
Date    
2010-01-17 -0.9304 3.7477 0.0000 
2010-01-24 -3.6348 1.5733 -3.6348 
2010-01-31 -1.8950 0.4957 -1.8950 
2010-02-07 -0.6990 -0.1480 -0.6990 
2010-02-14 1.4635 -3.4206 1.4635

我想DF [「C」]的平均值與從採摘DF的1個元素創建10.000隨機序列比較[「A」]或從df ['B']，每個日期，查看平均排名（1如果最高，0.95如果高於9500 randoms等）。

我寫了一個古老的配方，但我不能把它再次在一起，也許這有助於

def mean_diff(d): 
    result = {} 
    for k, (l, t) in d.iteritems(): 
     m = np.mean(t) 
     len_ = len(t) 
     result[k] = np.mean([m > np.mean(npr.choice(l, len_, True)) 
          for _ in range(10000)]) 
    return result

感謝

** 10000，因爲原始數據有很多超過5行。

UPDATE：

好了，爲了解決這個問題，我要開始解決一個小問題。看到這個question

來源

2016-02-22 hernanavella

嗯，有一個捷徑：

因爲我們有兩列一個相等數量的元素，B.我們可以把它們放在一個列表，從列表中取10000個隨機抽樣並進行比較平均值爲C

sample = df['C'].values 
a = df['A'].values 
b = df['B'].values 
population = np.concatenate((a,b), axis=0) 

def mean_diff(s, p): 
    m = np.mean(s) 
    len_ = len(s) 
    result = np.mean([m > np.mean(npr.choice(p, len_, True)) 
          for _ in range(10000)]) 
    return result 

mean_diff(sample, population)

來源

2016-02-23 19:11:05 hernanavella

比較樣本平均值VS隨機花色品種在Python

回答

相關問題