2016-12-02 41 views
0

我已經閱讀了很多關於這個特定錯誤的信息,並且一直沒能找到解決我的問題的答案。我有一個數據集,我已經分解成火車和測試集,並且正在運行KNeighborsClassifier。我的代碼如下...我的問題是,當我看着我的X_train的dtypes我根本沒有看到任何字符串格式化列。我的y_train是一個單一的分類變量。這是我的第一個stackoverflow帖子,所以我很抱歉,如果我忽略了任何手續,並感謝您的幫助! :)無法配置的類型:str()>浮點錯誤KNN模型

錯誤:

TypeError: unorderable types: str() > float() 

Dtypes:

X_train.dtypes.value_counts() 
Out[54]: 
int64  2035 
float64  178 
dtype: int64 

代碼:在sklearn

# Import Packages 
import os 
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
from sklearn.dummy import DummyRegressor 
from sklearn.cross_validation import train_test_split, KFold 
from matplotlib.ticker import FormatStrFormatter 
from sklearn import cross_validation 
from sklearn.neighbors import KNeighborsClassifier 
from sklearn.svm import SVC 
import pdb 

# Set Directory Path 
path = "file_path" 
os.chdir(path) 

#Select Import File 
data = 'RawData2.csv' 
delim = ',' 

#Import Data File 
df = pd.read_csv(data, sep = delim) 
print (df.head()) 

df.columns.get_loc('Categories') 

#Model 

#Select/Update Features 
X = df[df.columns[14:2215]] 

#Get Column Index for Target Variable 
df.columns.get_loc('Categories') 

#Select Target and fill na's with "Small" label 
y = y[y.columns[21]] 
print(y.values) 
y.fillna('Small') 

#Training/Test Set 
X_sample = X.loc[X.Var1 <1279] 
X_valid = X.loc[X.Var1 > 1278] 
y_sample = y.head(len(X_sample)) 
y_test = y.head(len(y)-len(X_sample)) 

X_train, X_test, y_train, y_test = train_test_split(X_sample, y_sample, test_size = 0.2) 
cv = KFold(n = X_train.shape[0], n_folds = 5, random_state = 17) 

print(X_train.shape, y_train.shape) 
X_train.dtypes.value_counts() 

from sklearn.neighbors import KNeighborsClassifier 
from sklearn.metrics import accuracy_score 

knn = KNeighborsClassifier(n_neighbors = 5) 
knn.fit(X_train, y_train) **<-- This is where the error is flagged** 
accuracy_score(knn.predict(X_test)) 

回答

相關問題