2017-04-26 74 views
1

現在已經過了幾個小時了,我試圖使用GridSearchCV對張量流DNN模型執行超參數優化。我的代碼的最新版本如下:Tensorflow DNNClassifier和scikit-learn GridSearchCV問題

import random 
from tensorflow.contrib.learn.python import learn 
from sklearn import datasets 
from sklearn.model_selection import GridSearchCV 
from sklearn.metrics import accuracy_score 

random.seed(42) 
iris = datasets.load_iris() 
feature_columns = learn.infer_real_valued_columns_from_input(iris.data) 
classifier = learn.DNNClassifier(
      feature_columns=feature_columns, 
      hidden_units=[10, 20, 10], 
      n_classes=3) 
grid_search = GridSearchCV(
      classifier, {'hidden_units': [[5, 5], [10, 10]]}, 
      scoring='accuracy', 
      fit_params={'steps': [50]}) 
grid_search.fit(iris.data, iris.target) 
score = accuracy_score(iris.target, grid_search.predict(iris.data)) 

我實際上已將它從a test in the tensorflow library itself中取出。

當我運行它,我得到以下錯誤:

--------------------------------------------------------------------------- 
TypeError         Traceback (most recent call last) 
<ipython-input-4-dce950001f99> in <module>() 
    16   scoring='accuracy', 
    17   fit_params={'steps': [50]}) 
---> 18 grid_search.fit(iris.data, iris.target) 
    19 score = accuracy_score(iris.target, grid_search.predict(iris.data)) 

/home/nmiotto/Development/upday/hellseher/playground/lib/python3.5/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups) 
    943    train/test set. 
    944   """ 
--> 945   return self._fit(X, y, groups, ParameterGrid(self.param_grid)) 
    946 
    947 

/home/nmiotto/Development/upday/hellseher/playground/lib/python3.5/site-packages/sklearn/model_selection/_search.py in _fit(self, X, y, groups, parameter_iterable) 
    548          n_candidates * n_splits)) 
    549 
--> 550   base_estimator = clone(self.estimator) 
    551   pre_dispatch = self.pre_dispatch 
    552 

/home/nmiotto/Development/upday/hellseher/playground/lib/python3.5/site-packages/sklearn/base.py in clone(estimator, safe) 
    68  for name, param in six.iteritems(new_object_params): 
    69   new_object_params[name] = clone(param, safe=False) 
---> 70  new_object = klass(**new_object_params) 
    71  params_set = new_object.get_params(deep=False) 
    72 

TypeError: __init__() got an unexpected keyword argument 'params' 

我使用Python 3.5.2更新了所有的庫到最新版本,更準確的說:

$ pip3 freeze 
numpy==1.12.1 
scikit-learn==0.18.1 
scipy==0.19.0 
tensorflow==1.1.0 

我跑出去的想法,我無法弄清楚我錯過了什麼。任何幫助,將不勝感激。 我假設當然,我不必猴子補丁或破解任何東西到現有的庫。

回答

2

此問題來自克隆估算器,如堆棧錯誤中指定的那樣。

new_object = klass(**new_object_params) 

new_object_params被返回上面幾行:

new_object_params = estimator.get_params(deep=False) 

當你觀察,估計是你的DNNClassifier,其克隆正在做作了gridsearchCV。但estimator.get_params(deep=False)返回如下:

{'params': {'head': <tensorflow.contrib.learn.python.learn.estimators.head._MultiClassHead object at 0x7f720df04490>, 
'hidden_units': [10, 20, 10], 
'feature_columns': (_RealValuedColumn(column_name='', dimension=4, default_value=None, dtype=tf.float64, normalizer=None),), 
'embedding_lr_multipliers': None, 'optimizer': None, 'dropout': None, 
'gradient_clip_norm': None, 
'activation_fn': <function relu at 0x7f7221aa8b18>, 'input_layer_min_slice_size': None}} 

正如你看到的第一個參數被命名爲params。現在將嘗試將其設置爲DNNClassifier的init_method以獲取新對象。

但在版本tenserflow的1.1.0,初始化參數如下:

def __init__(self, 
       hidden_units, 
       feature_columns, 
       model_dir=None, 
       n_classes=2, 
       weight_column_name=None, 
       optimizer=None, 
       activation_fn=nn.relu, 
       dropout=None, 
       gradient_clip_norm=None, 
       enable_centered_bias=False, 
       config=None, 
       feature_engineering_fn=None, 
       embedding_lr_multipliers=None, 
       input_layer_min_slice_size=None, 
       label_keys=None): 
... 
... 

無法在這裏命名爲params參數。因此錯誤。

但是如果你看到tensorflow的當前主分支爲init()方法,它是這樣的: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/learn/python/learn/estimators/dnn.py#L327

super(DNNClassifier, self).__init__(
     model_fn=_dnn_model_fn, 
     model_dir=model_dir, 
     config=config, 
     params={ 
      "head": 
       head_lib.multi_class_head(
        n_classes, 
        weight_column_name=weight_column_name, 
        enable_centered_bias=enable_centered_bias, 
        label_keys=label_keys), 
      "hidden_units": hidden_units, 
      "feature_columns": self._feature_columns, 
      "optimizer": optimizer, 
      "activation_fn": activation_fn, 
      "dropout": dropout, 
      "gradient_clip_norm": gradient_clip_norm, 
      "embedding_lr_multipliers": embedding_lr_multipliers, 
      "input_layer_min_slice_size": input_layer_min_slice_size, 
     }, 
     feature_engineering_fn=feature_engineering_fn) 

因此,也許你在主分支看着那個測試是與此相關的代碼更改。您可以自行下載當前分支並編譯庫,以消除此錯誤。

否則,搜索如何在版本1.1.0網格搜索。

+0

我試過當前的主版本,但仍然沒有運氣(同樣的錯誤)。另外,這個測試似乎並不真正起作用,它可能不是一個真正的測試,但更像是他們寫了很長時間以前嘗試過的一種冒險腳本,現在已經打破了 –

+0

@NicolaMiotto我沒有使用master分支對它進行檢查。測試後我會回覆你。同時,您可以使用ParameterGrid和cross_val_score的組合來達到同樣的效果。如果你需要,我可以在這裏編輯答案來包含它。 –