2017-10-15 62 views
0

定製變壓器我下面的sklearn_pandas穿行在sklearn_pandas README on github發現,我試圖修改DateEncoder()定製變壓器爲例做2個額外的東西:其將日期,新列

  • 轉換字符串類型的列以日期格式作爲參數時的日期時間
  • 當吐出新列時附加原始列名稱。例如:如果輸入列:Date1則輸出:Date1_year,Date1_month,Date_1日。

這裏是我的嘗試(與sklearn管道的一個相當基本的瞭解):

import pandas as pd 
import numpy as np 
from sklearn.base import TransformerMixin, BaseEstimator 
from sklearn_pandas import DataFrameMapper 

class DateEncoder(TransformerMixin): 

    ''' 
    Specify date format using python strftime formats 
    ''' 

    def __init__(self, date_format='%Y-%m-%d'): 
     self.date_format = date_format 

    def fit(self, X, y=None): 
     self.dt = pd.to_datetime(X, format=self.date_format) 
     return self 

    def transform(self, X): 
     dt = X.dt 
     return pd.concat([dt.year, dt.month, dt.day], axis=1) 


data = pd.DataFrame({'dates1': ['2001-12-20','2002-10-21','2003-08-22','2004-08-23', 
           '2004-07-20','2007-12-21','2006-12-22','2003-04-23'], 
        'dates2' : ['2012-12-20','2009-10-21','2016-08-22','2017-08-23', 
           '2014-07-20','2011-12-21','2014-12-22','2015-04-23']}) 

DATE_COLS = ['dates1', 'dates2'] 

Mapper = DataFrameMapper([(i, DateEncoder(date_format='%Y-%m-%d')) for i in DATE_COLS], input_df=True, df_out=True) 
test = Mapper.fit_transform(data) 

但在運行時,我收到以下錯誤:

AttributeError: Can only use .dt accessor with datetimelike values 

爲什麼我收到這個錯誤和如何解決它? 任何幫助與上面提到的原始列重命名列名(Date1_year,Date1_month,Date_1天)將不勝感激!

+0

您在'fit'中將'X'轉換爲'self.dt'處的日期時間,但'transform()'不能與'self.dt'一起使用。 'X.dt'因爲'X'不是datetime類型而失敗。 –

回答

0

我能夠打破數據格式轉換和日期分割成兩個單獨的變壓器,它的工作。

import pandas as pd 
from sklearn.base import TransformerMixin 
from sklearn_pandas import DataFrameMapper 



data2 = pd.DataFrame({'dates1': ['2001-12-20','2002-10-21','2003-08-22','2004-08-23', 
           '2004-07-20','2007-12-21','2006-12-22','2003-04-23'], 
        'dates2' : ['2012-12-20','2009-10-21','2016-08-22','2017-08-23', 
           '2014-07-20','2011-12-21','2014-12-22','2015-04-23']}) 

class DateFormatter(TransformerMixin): 

    def fit(self, X, y=None): 
     # stateless transformer 
     return self 

    def transform(self, X): 
     # assumes X is a DataFrame 
     Xdate = X.apply(pd.to_datetime) 
     return Xdate 


class DateEncoder(TransformerMixin): 

    def fit(self, X, y=None): 
     return self 

    def transform(self, X): 
     dt = X.dt 
     return pd.concat([dt.year, dt.month, dt.day], axis=1) 


DATE_COLS = ['dates1', 'dates2'] 

datemult = DataFrameMapper(
      [ (i,[DateFormatter(),DateEncoder()]) for i in DATE_COLS  ] 
      , input_df=True, df_out=True) 

df = datemult.fit_transform(data2) 

此代碼輸出:

Out[4]: 
    dates1_0 dates1_1 dates1_2 dates2_0 dates2_1 dates2_2 
0  2001  12  20  2012  12  20 
1  2002  10  21  2009  10  21 
2  2003   8  22  2016   8  22 
3  2004   8  23  2017   8  23 
4  2004   7  20  2014   7  20 
5  2007  12  21  2011  12  21 
6  2006  12  22  2014  12  22 
7  2003   4  23  2015   4  23 

但是我仍然在尋找一種方式來命名新列,同時將DateEncoder()變壓器。例如:dates_1_0dates_1_yeardates_2_2dates_2_month。我很樂意選擇它作爲解決方案。

相關問題