2015-05-11 474 views
2

我目前正試圖在Python中實現一個MLR,我不知道如何去應用我發現的未來值的係數。使用OLS迴歸(Python,StatsModels,Pandas)預測未來值

import pandas as pd 
import statsmodels.formula.api as sm 
import statsmodels.api as sm2 

TV = [230.1, 44.5, 17.2, 151.5, 180.8] 
Radio = [37.8,39.3,45.9,41.3,10.8] 
Newspaper = [69.2,45.1,69.3,58.5,58.4] 
Sales = [22.1, 10.4, 9.3, 18.5,12.9] 
df = pd.DataFrame({'TV': TV, 
        'Radio': Radio, 
        'Newspaper': Newspaper, 
        'Sales': Sales}) 

Y = df.Sales 
X = df[['TV','Radio','Newspaper']] 
X = sm2.add_constant(X) 
model = sm.OLS(Y, X).fit() 
>>> model.params 
const  -0.141990 
TV   0.070544 
Radio  0.239617 
Newspaper -0.040178 
dtype: float64 

所以我們可以說我想預測出的「銷售」以下數據框:

EDIT 

TV  Radio Newspaper Sales 
230.1 37,8  69.2  22.4 
44.5 39.3  45.1  10.1 
... ...  ...  ... 
25  15  15 
30  20  22 
35  22  36 

我一直在努力,我發現這裏的方法,但我似乎無法得到它的工作:Forecasting using Pandas OLS

謝謝!

回答

5

假設DF2是你新出的樣本數據幀的:

model = sm.OLS(Y, X).fit() 
new_x = df2.loc[df.Sales.notnull(), ['TV', 'Radio', 'Newspaper']].values 
new_x = sm2.add_constant(new_x) # sm2 = statsmodels.api 
y_predict = model.predict(new_x) 

>>> y_predict 
array([ 4.61319034, 5.88274588, 6.15220225]) 

您可以直接將結果賦予DF2如下:

df2.loc[:, 'Sales'] = model.predict(new_x) 

爲了填補從原來的數據幀與預測缺少銷售值從你的迴歸中,嘗試:

X = df.loc[df.Sales.notnull(), ['TV', 'Radio', 'Newspaper']] 
X = sm2.add_constant(X) 
Y = df[df.Sales.notnull()].Sales 

model = sm.OLS(Y, X).fit() 
new_x = df.loc[df.Sales.isnull(), ['TV', 'Radio', 'Newspaper']] 
new_x = sm2.add_constant(new_x) # sm2 = statsmodels.api 

df.loc[df.Sales.isnull(), 'Sales'] = model.predict(new_x)