2014-03-28 88 views
0

我是新的python,我正在嘗試開發一個程序與梯度提升迴歸。 我有兩組大數據,一組訓練集和一組測試集,至少我有完全相同的列。我的目標是用訓練集的信息預測測試集的SeriousDlqin2yrs列。大提升迴歸:「TypeError:不能執行減少靈活類型」

這是我寫的程序:

import numpy as np 
import csv as csv 
import pandas as pd 
from sklearn import ensemble 
from sklearn.ensemble import GradientBoostingRegressor 
from sklearn.utils import shuffle 

# Load data 

csv_file_object = csv.reader(open('cs-training-cleandata2NOLOG.csv', 'rb')) #Load in the training csv file 
header = csv_file_object.next() #Skip the fist line as it is a header 
train_data=[] #Creat a variable called 'train_data' 
for row in csv_file_object: #Skip through each row in the csv file 
train_data.append(row[1:]) #adding each row to the data variable 
train_data = np.array(train_data) #Then convert from a list to an array 

test_file_object = csv.reader(open('cs-test-cleandata2NOLOG.csv', 'rb')) #Load in the test csv file 
header = test_file_object.next() #Skip the fist line as it is a header 
test_data=[] #Creat a variable called 'test_data' 
ids = [] 
for row in test_file_object: #Skip through each row in the csv file 
ids.append(row[0]) 
test_data.append(row[1:]) #adding each row to the data variable 
test_data = np.array(test_data) #Then convert from a list to an array 

test_data = np.delete(test_data,[0],1) #remove SeriousDlqin2yrs 

print 'Training ' 
# Fit regression model 

clf = GradientBoostingRegressor(n_estimators=1000, min_samples_split=100, learning_rate=0.01) 
clf = clf.fit(train_data[0::,1::],train_data[0::,0]) 

print 'Predicting' 

output=clf.predict(test_data) 

open_file_object = csv.writer(open("GradientBoostedRegression1.1.csv", "wb")) 
open_file_object.writerow(["Id","Probability"]) 
open_file_object.writerows(zip(ids, output)) 

但是當我運行該程序,蟒蛇給我這樣的回答:

Traceback (most recent call last): 
    File "C:\Users\Paul HONORE\Dropbox\Research Study\Kaggle\Bank\GradientBoostedRegression1.1.py", line 64, in <module> 
    clf = clf.fit(train_data[0::,1::],train_data[0::,0]) 
    File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 1126, in fit 
    return super(GradientBoostingRegressor, self).fit(X, y) 
    File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 595, in fit 
    self.init_.fit(X, y) 
    File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 69, in fit 
    self.mean = np.mean(y) 
    File "C:\Python27\lib\site-packages\numpy\core\fromnumeric.py", line 2716, in mean 
out=out, keepdims=keepdims) 
    File "C:\Python27\lib\site-packages\numpy\core\_methods.py", line 62, in _mean 
ret = um.add.reduce(arr, axis=axis, dtype=dtype, out=out, keepdims=keepdims) 
TypeError: cannot perform reduce with flexible type 

不知從何方知道,我讀很多關於這個問題的論文,但從來沒有找到解決這個問題的方法。

非常感謝您的幫助。

+0

[Python的 - 嘗試使用numpy.mean時「無法執行與靈活型減少」]的可能重複(http://stackoverflow.com/questions/20061095/python-cannot-perform - 使用靈活類型時嘗試使用numpy-mea) –

+0

這不是一回事,我已經閱讀過這個主題,但這不是同一個問題。 – user3471868

回答

2

我認爲這個問題可以通過在數組函數中指定一個類型來解決。 例如:

train_data = np.array(train_data, dtype = 'float_') 
相關問題