2016-03-28 32 views
1

我寫下面的代碼來標準化的數據幀的幾個列:的Python:從λ表達式數據幀錯誤

import pandas as pd 

train = pd.read_csv('test1.csv') 
header = train.columns.values 
print(train) 
print(header) 

inputs = header[0:3] 
trainArr = train.as_matrix(inputs) 

print(inputs) 
trainArr[inputs] = trainArr[inputs].apply(lambda x: (x - x.mean())/(x.max() - x.min())) 

從代碼的一些輸入是:

v1 v2 v3 result 
0 12 31 31  0 
1 34 52 4  1 
2 32 4 5  1 
3 7 89 2  0 
['v1' 'v2' 'v3' 'result'] 
['v1' 'v2' 'v3'] 

然而,我得到了以下錯誤:

trainArr[inputs] = trainArr[inputs].apply(lambda x: (x - x.mean())/(x.max() - x.min())) 
IndexError: arrays used as indices must be of integer (or boolean) type 

有沒有人知道我在這裏錯過了什麼?謝謝!

+0

什麼是'打印train.head()'? – jezrael

+0

剛剛在上面添加了更多信息。謝謝! – Edamame

回答

1

我想你可以先選擇前三列[:3],然後通過train[header]創建子集DataFrame。最後你可以apply功能第3列:

print (train) 
    v1 v2 v3 result 
0 12 31 31  0 
1 34 52 4  1 
2 32 4 5  1 
3 7 89 2  0 

header = train.columns[:3] 
print(header) 
Index([u'v1', u'v2', u'v3'], dtype='object') 

print (train[header]) 
    v1 v2 v3 
0 12 31 31 
1 34 52 4 
2 32 4 5 
3 7 89 2 

train[header] = train[header].apply(lambda x: (x - x.mean())/(x.max() - x.min())) 
print (train) 
     v1  v2  v3 result 
0 -0.342593 -0.152941 0.706897  0 
1 0.472222 0.094118 -0.224138  1 
2 0.398148 -0.470588 -0.189655  1 
3 -0.527778 0.529412 -0.293103  0 

但我覺得更好的是使用iloc選擇第一個3列:

print (train.iloc[:,:3]) 
    v1 v2 v3 
0 12 31 31 
1 34 52 4 
2 32 4 5 
3 7 89 2 

train.iloc[:,:3] = train.iloc[:,:3].apply(lambda x: (x - x.mean())/(x.max() - x.min())) 
print train 
     v1  v2  v3 result 
0 -0.342593 -0.152941 0.706897  0 
1 0.472222 0.094118 -0.224138  1 
2 0.398148 -0.470588 -0.189655  1 
3 -0.527778 0.529412 -0.293103  0