2017-08-03 49 views
-1

我想使用隔離森林作爲python 2.7 anaconda框架中的分類器,這裏是我的示例代碼。在Python中獲取值錯誤

import numpy as np 
import matplotlib.pyplot as plt 
from sklearn.ensemble import IsolationForest 

rng = np.random.RandomState(42) 
import pandas 
from pandas import read_csv 
from numpy import set_printoptions 

filename1 = 'path/Cleanedinput.csv' 
dataframe1 = read_csv(filename, names=names,low_memory=False) 
Xtrain = dataframe1.values 
Xtrain.shape 
(996405L, 16L) 
Xtrain[0:2] 

array([[1744121620.0, 2590000000.0, '44846', '39770', '6', '100', 1L, '5', '290', 60L, '1', 1L, '-6', '46846', 12.9833, 77.5833], 
[1724121520.0, 2260000000.0, '12337', '31772', '6', '100', 1L, '1', '54', 60L, '1', 1L, '-6', '41637', 23.4833, 24.123]], dtype=object) 

clf = IsolationForest(max_samples=10, random_state=rng) 
clf.fit(X_train) 

我Xtrian陣列看起來像

array([[1744121620.0, 2590000000.0, '44846', '39770', '6', '100', 1L, '5', '290', 60L, '1', 1L, '-6', '46846', 12.9833, 77.5833], 
[1724121520.0, 2260000000.0, '12337', '31772', '6', '100', 1L, '1', '54', 60L, '1', 1L, '-6', '41637', 23.4833, 24.123]], dtype=object) 

,但我得到的值誤差

--------------------------------------------------------------------------- 
ValueError        Traceback (most recent call last) 
<ipython-input-21-0a80fca9c379> in <module>() 
----> 1 clf.fit(X_train) 

C:\Anaconda\lib\site-packages\sklearn\ensemble\iforest.pyc in fit(self, X, y, sample_weight) 
    157   # ensure_2d=False because there are actually unit test checking we fail 
    158   # for 1d. 
--> 159   X = check_array(X, accept_sparse=['csc'], ensure_2d=False) 
    160   if issparse(X): 
    161    # Pre-sort indices to avoid that each individual tree of the 

C:\Anaconda\lib\site-packages\sklearn\utils\validation.pyc in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator) 
    380          force_all_finite) 
    381  else: 
--> 382   array = np.array(array, dtype=dtype, order=order, copy=copy) 
    383 
    384   if ensure_2d: 

ValueError: could not convert string to float: - 

有什麼,我在數據類型方面缺少

+1

你的csv有多少行?該錯誤表示你正試圖將'「 - 」'轉換爲浮點數。你可能在csv中有'' - 「'。雖然 – jacoblaw

+1

的前兩行沒有看到,但我不確定它是如何發生的,但看起來您的輸入字符串之一不過是一個負號。 – Prune

回答

0

一些您有的Xtrain變量中的數據正在表示爲Strings而不是numerical值。

Xtrain您提供

array([[1744121620.0, 2590000000.0, '44846', '39770', '6', '100', 1L, '5', '290', 60L, '1', 1L, '-6', '46846', 12.9833, 77.5833], [1724121520.0, 2260000000.0, '12337', '31772', '6', '100', 1L, '1', '54', 60L, '1', 1L, '-6', '41637', 23.4833, 24.123]], dtype=object) 

'44846' , '39770 ..etc是一個字符串值。

看看這個Xtraindtype,其object,將dtype轉換爲float/int,它應該工作。