2017-04-10 55 views
0

我已經定義了一個二進制分類器作爲波紋管:我用'gbc'方法(梯度提升分類器)調用它,並且我得到錯誤min_samples_split must be at least 2 or in (0, 1], got 1最後一行featuresClasses是一個數據幀,並且featureLabels是功能列表min_samples_split必須至少爲2或在(0,1]中,得到1

Binary_classifier(method, featureLabels, featuresClasses): 

    membershipIds = list(set(featuresClasses['membershipId'])) 
    n_membershipIds = len(membershipIds) 

    index_rand = np.random.permutation(n_membershipIds) 
    test_size = int(0.3 * n_membershipIds) 

    membershipIds_test = list(itemgetter(*index_rand[:test_size])(membershipIds)) 
    membershipIds_train = list(itemgetter(*index_rand[test_size+1:])(membershipIds)) 

    data_test = featuresClasses[featuresClasses['membershipId'].isin(membershipIds_test)] 
    data_train = featuresClasses[featuresClasses['membershipId'].isin(membershipIds_train)] 

    data_test = data_test[data_test['standing'].isin([0, 1])] 
    data_train = data_train[data_train['standing'].isin([0, 1])] 

    X_test = data_test[featureLabels].as_matrix() 
    y_test = data_test['standing'].values.astype(int) 

    X_train = data_train[featureLabels].as_matrix() 
    y_train = data_train['standing'].values.astype(int) 

    # -------------------------- Run classifier 
    print 'Binary classification by', method 

    if method == 'svm': 
     classifier = svm.SVC(kernel='linear', probability=True) 
     y_score = classifier.fit(X_train, y_train).decision_function(X_test) 

    elif method == 'gbc': 
     params = {'n_estimators': 200, 'max_depth': 3, 'min_samples_split': 1, 'learning_rate': 0.1, 'loss': 'deviance'} 

     classifier = GradientBoostingClassifier(**params) 
     y_score = classifier.fit(X_train, y_train).predict(X_test) 

回答

2

按照GradientBoostingClassifier documentation:。

min_samples_split:整數,浮點,可選的(缺省值= 2)

The minimum number of samples required to split an internal node: 

    If int, then consider min_samples_split as the minimum number. 
    If float, then min_samples_split is a percentage and ceil(min_samples_split * n_samples) 
       are the minimum number of samples for each split. 

你,在你的代碼指定'min_samples_split': 1。這不是一個有效的案例。它的最小int值是2 如果你想輸入1爲浮動(這意味着1 *的特徵數)(即你想利用你所有的功能集成到min_samples_split),然後指定爲'min_samples_split': 1.0。當指定爲1時,它被視爲一個整數,並因此發生錯誤。

這是一個差錯顯示爲(0,1],而不是(0.0,1.0),這是造成混亂。這也已被問及scitit學習的github問題,並已實施下一個版本:

+0

謝謝@Vivek庫馬爾 – YNr

相關問題