我試圖通過大熊貓蟒蛇數據框採用線性迴歸上的一組:Python的大熊貓迴歸GROUPBY
這是數據幀DF:
group date value
A 01-02-2016 16
A 01-03-2016 15
A 01-04-2016 14
A 01-05-2016 17
A 01-06-2016 19
A 01-07-2016 20
B 01-02-2016 16
B 01-03-2016 13
B 01-04-2016 13
C 01-02-2016 16
C 01-03-2016 16
#import standard packages
import pandas as pd
import numpy as np
#import ML packages
from sklearn.linear_model import LinearRegression
#First, let's group the data by group
df_group = df.groupby('group')
#Then, we need to change the date to integer
df['date'] = pd.to_datetime(df['date'])
df['date_delta'] = (df['date'] - df['date'].min())/np.timedelta64(1,'D')
現在我想預測對每個值小組爲01-10-2016。
我希望得到一個新的數據幀是這樣的:
group 01-10-2016
A predicted value
B predicted value
C predicted value
這How to apply OLS from statsmodels to groupby不起作用
for group in df_group.groups.keys():
df= df_group.get_group(group)
X = df['date_delta']
y = df['value']
model = LinearRegression(y, X)
results = model.fit(X, y)
print results.summary()
我收到以下錯誤
ValueError: Found arrays with inconsistent numbers of samples: [ 1 52]
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.DeprecationWarning)
UPDATE:
我把它改成
for group in df_group.groups.keys():
df= df_group.get_group(group)
X = df[['date_delta']]
y = df.value
model = LinearRegression(y, X)
results = model.fit(X, y)
print results.summary()
,現在我得到這個錯誤:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
請解釋一下你的意思是「它不工作」。它會引發錯誤嗎?如果是這樣,請包含回溯。如果不是,你的預期產出是多少?你得到的產出是多少? – ayhan
@ayhan - 完成!謝謝 – jeangelj
你在循環中破壞了你的'df'。 – piRSquared