獲取t統計量的Python函數

爲了在置信區間計算中使用t-statistic，我正在查找Python函數（或者編寫自己的函數，如果沒有的話）。獲取t統計量的Python函數

我發現了表格，可以給出各種概率/自由度的答案，如this one，但我希望能夠爲任何給定的概率計算這個值。對於不熟悉這種自由度的人來說，樣本-1中的數據點（n）的數量以及頂部的列標題的數量是例如（p）的概率（p）。如果您正在查找t分數以用於計算95％的置信度，那麼如果重複進行n次測試，結果將落入平均值+/-置信區間內，則使用0.05的2尾顯着性水平。

我已經研究過在scipy.stats中使用各種函數，但沒有看到似乎允許我在上面描述的簡單輸入。

Excel有一個簡單的實現，例如，爲了得到1000的樣本的t分數，我需要95％的信心我會使用：=TINV(0.05,999)並得到分數〜1.96

這裏是我用來實現目前爲止的置信區間的代碼，你可以看到我使用獲得T值在目前的非常粗暴的方式（只允許perc_conf幾個值，並警告說，這是不準確的樣品< 1000）：

# -*- coding: utf-8 -*- 
from __future__ import division 
import math 

def mean(lst): 
    # μ = 1/N Σ(xi) 
    return sum(lst)/float(len(lst)) 

def variance(lst): 
    """ 
    Uses standard variance formula (sum of each (data point - mean) squared) 
    all divided by number of data points 
    """ 
    # σ² = 1/N Σ((xi-μ)²) 
    mu = mean(lst) 
    return 1.0/len(lst) * sum([(i-mu)**2 for i in lst]) 

def conf_int(lst, perc_conf=95): 
    """ 
    Confidence interval - given a list of values compute the square root of 
    the variance of the list (v) divided by the number of entries (n) 
    multiplied by a constant factor of (c). This means that I can 
    be confident of a result +/- this amount from the mean. 
    The constant factor can be looked up from a table, for 95% confidence 
    on a reasonable size sample (>=500) 1.96 is used. 
    """ 
    if perc_conf == 95: 
     c = 1.96 
    elif perc_conf == 90: 
     c = 1.64 
    elif perc_conf == 99: 
     c = 2.58 
    else: 
     c = 1.96 
     print 'Only 90, 95 or 99 % are allowed for, using default 95%' 
    n, v = len(lst), variance(lst) 
    if n < 1000: 
     print 'WARNING: constant factor may not be accurate for n < ~1000' 
    return math.sqrt(v/n) * c

這裏是一個上述代碼示例請求：

# Example: 1000 coin tosses on a fair coin. What is the range that I can be 95% 
#   confident the result will f all within. 

# list of 1000 perfectly distributed... 
perc_conf_req = 95 
n, p = 1000, 0.5 # sample_size, probability of heads for each coin 
l = [0 for i in range(int(n*(1-p)))] + [1 for j in range(int(n*p))] 
exp_heads = mean(l) * len(l) 
c_int = conf_int(l, perc_conf_req) 

print 'I can be '+str(perc_conf_req)+'% confident that the result of '+str(n)+ \ 
     ' coin flips will be within +/- '+str(round(c_int*100,2))+'% of '+\ 
     str(int(exp_heads)) 
x = round(n*c_int,0) 
print 'i.e. between '+str(int(exp_heads-x))+' and '+str(int(exp_heads+x))+\ 
     ' heads (assuming a probability of '+str(p)+' for each flip).'

此輸出是：

我可以是95％確信1000硬幣的結果翻轉將頭500即531和469之間（假定0.5的概率內 +/- 3.1％爲每個翻轉）。

我也考慮計算一個範圍0123'，然後返回得到的概率最接近所需的t分數，但我有執行公式的問題。讓我知道，如果這是相關的，你想看到的代碼，但我沒有假設可能更簡單的方法。

在此先感謝。

來源

2013-10-12 ChrisProsser

您是否嘗試過scipy？

您將需要installl的SciPy的圖書館...詳細瞭解如何安裝它：http://www.scipy.org/install.html

一旦安裝，你可以複製Excel功能就像這樣：

from scipy import stats 
#Studnt, n=999, p<0.05, 2-tail 
#equivalent to Excel TINV(0.05,999) 
print stats.t.ppf(1-0.025, 999) 

#Studnt, n=999, p<0.05%, Single tail 
#equivalent to Excel TINV(2*0.05,999) 
print stats.t.ppf(1-0.05, 999)

你也可以閱讀有關這裏安裝磁帶庫：how to install scipy for python?

來源

2013-10-12 21:27:50 henderso

我曾試過scipy，但還沒有想出如何使用它。如果我打印rv它是一個對象，我怎麼能得到這個值？ – ChrisProsser

嗨Chris..it看起來像打印rv的ppf對象是你需要做的..我更新了我的例子...注意alpha/2 – henderso

這真是非常感謝，你知道是否有任何方法獲得雙面評分，或者你只需要減半的概率，使其工作？ – ChrisProsser

試試下面的代碼：

from scipy import stats 
#Studnt, n=22, 2-tail 
#stats.t.ppf(1-0.025, df) 
# df=n-1=22-1=21 
print (stats.t.ppf(1-0.025, 21))

來源

2017-08-09 09:25:07 javac

獲取t統計量的Python函數

回答

相關問題