2016-11-01 31 views
1

我嘗試建立與變量變換 一個管道,我做如下_transform()採用2個位置參數,但3分別給予

import numpy as np 
import pandas as pd 
import sklearn 
from sklearn import linear_model 
from sklearn.base import BaseEstimator, TransformerMixin 
from sklearn.pipeline import Pipeline 

數據幀

df = pd.DataFrame({'y': [4,5,6], 'a':[3,2,3], 'b' : [2,3,4]}) 

我試圖讓新變量爲預測

​​

然後我做了一個管道

X = df[['a', 'b']] 
y = df['y'] 
regressor = linear_model.SGDRegressor() 
pipeline = Pipeline([ 
     ('transform', Complex(X['a'], X['b'])) , 
     ('model_fitting', regressor) 
    ]) 
pipeline.fit(X, y) 

,我得到錯誤

pred = pipeline.predict(X) 
pred 
TypeError         Traceback (most recent call last) 
<ipython-input-555-7a07ccb0c38a> in <module>() 
----> 1 pred = pipeline.predict(X) 
     2 pred 

C:\Program Files\Anaconda3\lib\site-packages\sklearn\utils\metaestimators.py in <lambda>(*args, **kwargs) 
    52 
    53   # lambda, but not partial, allows help() to work with update_wrapper 
---> 54   out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs) 
    55   # update the docstring of the returned function 
    56   update_wrapper(out, self.fn) 

C:\Program Files\Anaconda3\lib\site-packages\sklearn\pipeline.py in predict(self, X) 
    324   for name, transform in self.steps[:-1]: 
    325    if transform is not None: 
--> 326     Xt = transform.transform(Xt) 
    327   return self.steps[-1][-1].predict(Xt) 
    328 

TypeError: transform() missing 1 required positional argument: 'X2' 

什麼,我做錯了什麼?我發現錯誤在Complex()類中。如何解決它?

回答

2

所以問題是,transform預計陣列形狀的一個參數[n_samples, n_features]

參見例子在documentation of sklearn.pipeline.Pipeline部分,它使用sklearn.feature_selection.SelectKBest作爲變換,並且可以看到其source,預計X是一個數組而不是單獨的變量,如X1X2

總之,你的代碼可以固定這樣的:


import pandas as pd 
import sklearn 
from sklearn import linear_model 
from sklearn.pipeline import Pipeline 

df = pd.DataFrame({'y': [4,5,6], 'a':[3,2,3], 'b' : [2,3,4]}) 

class Complex(): 
    def transform(self, Xt): 
     return pd.DataFrame(Xt['a'] - Xt['b']) 

    def fit_transform(self, X1, X2): 
     return self.transform(X1) 

X = df[['a', 'b']] 
y = df['y'] 
regressor = linear_model.SGDRegressor() 
pipeline = Pipeline([ 
     ('transform', Complex()) , 
     ('model_fitting', regressor) 
    ]) 
pipeline.fit(X, y) 

pred = pipeline.predict(X) 
print(pred) 
相關問題