2017-06-01 74 views
1

我有最難的時間搞清楚爲什麼我得到這個錯誤。我搜索了很多,但無法罰款任何解決方案k最近鄰居中浮點數的文字無效

import numpy as np 
import warnings 
from collections import Counter 
import pandas as pd 

def k_nearest_neighbors(data, predict, k=3): 
if len(data) >= k: 
    warnings.warn('K is set to a value less than total voting groups!') 
distances = [] 
for group in data: 
    for features in data[group]: 
     euclidean_distance = np.linalg.norm(np.array(features)- 
np.array(predict)) 
     distances.append([euclidean_distance,group]) 
votes = [i[1] for i in sorted(distances)[:k]] 
vote_result = Counter(votes).most_common(1)[0][0] 
return vote_result 

df = pd.read_csv("data.txt") 
df.replace('?',-99999, inplace=True) 
df.drop(['id'], 1, inplace=True) 
full_data = df.astype(float).values.tolist() 

print(full_data) 

運行後。它提供了錯誤

Traceback (most recent call last): 
File "E:\Jazab\Machine Learning\Lec18(Testing K Neatest Nerighbors 
Classifier)\Lec18(Testing K Neatest Nerighbors 
Classifier)\Lec18_Testing_K_Neatest_Nerighbors_Classifier_.py", line 25, in 
<module> 
full_data = df.astype(float).values.tolist() 
File "C:\Python27\lib\site-packages\pandas\util\_decorators.py", line 91, in 
wrapper 
return func(*args, **kwargs) 
File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 3299, in 
astype 
**kwargs) 
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 3224, in 
astype 
return self.apply('astype', dtype=dtype, **kwargs) 
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 3091, in 
apply 
applied = getattr(b, f)(**kwargs) 
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 471, in 
astype 
**kwargs) 
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 521, in 
_astype 
values = astype_nansafe(values.ravel(), dtype, copy=True) 
File "C:\Python27\lib\site-packages\pandas\core\dtypes\cast.py", line 636, 
in astype_nansafe 
return arr.astype(dtype) 
ValueError: invalid literal for float(): 3) <-----Reappears in Group 8 as: 
Press any key to continue . . . 

如果我刪除astype(float)程序運行正常 應該我需要做什麼?

回答

0

它看起來像你有3)作爲您的CSV文件中的條目,Pandas抱怨,因爲它不能將它投到浮動因爲)

+0

我的CSV有患者 – Jazab

+0

記錄也許表明該文件的一個例子,在你的問題。無論如何,我認爲如果你無法控制源數據,jezrael的答案就是你所需要的。 – SiHa

+0

你好,你的建議工作。它現在正在工作,謝謝 – Jazab

1

有壞數據(3)),所以需要to_numericapply,因爲需要處理所有列。

非數字轉換爲NaN s,它被fillna替換爲某個標量,例如, 0

full_data = df.apply(pd.to_numeric, errors='coerce').fillna(0).values.tolist() 

樣品:

df = pd.DataFrame({'A':[1,2,7], 'B':['3)',4,5]}) 
print (df) 
    A B 
0 1 3) 
1 2 4 
2 7 5 

full_data = df.apply(pd.to_numeric, errors='coerce').fillna(0).values.tolist() 
print (full_data) 
[[1.0, 0.0], [2.0, 4.0], [7.0, 5.0]]