2016-01-18 81 views
0

我有一個numpy的陣列,它看起來像這樣:斯普利特的關鍵陣列的numpy的陣列

+----+-------+----------------+ 
| id | class | probability | 
+----+-------+----------------+ 
| 0 | 0 | 0.371301944865 | 
| 0 | 1 | 0.317619162391 | 
| 0 | -1 | 0.311078922721 | 
| 1 | 0 | 0.401434454687 | 
| 1 | 1 | 0.316000976419 | 
| 1 | -1 | 0.282564557522 | 
| 2 | 1 | 0.361490456577 | 
| 2 | 0 | 0.324832048066 | 
| 2 | -1 | 0.313677512904 | 
| . | . | .    | 
| . | . | .    | 
| . | . | .    | 
+----+-------+----------------+ 

或更正式的:

x = numpy.array([[ 0.00000000e+00, 0.00000000e+00, 3.71301945e-01], 
     [ 0.00000000e+00, 1.00000000e+00, 3.17619162e-01], 
     [ 0.00000000e+00, -1.00000000e+00, 3.11078923e-01], 
     [ 1.00000000e+00, 0.00000000e+00, 4.01434455e-01], 
     [ 1.00000000e+00, 1.00000000e+00, 3.16000976e-01], 
     [ 1.00000000e+00, -1.00000000e+00, 2.82564558e-01], 
     [ 2.00000000e+00, 1.00000000e+00, 3.61490457e-01], 
     [ 2.00000000e+00, 0.00000000e+00, 3.24832048e-01], 
     [ 2.00000000e+00, -1.00000000e+00, 3.13677513e-01]]) 

正如你可以看到,每一個ID,我有三個類別,每個類別都有其概率。我想將其轉換爲這樣的四列陣列:

id/class   -1    0    1 
0    0.311078922721 0.371301944865 0.317619162391 
1    0.282564557522 0.401434454687 0.316000976419 
.    .     .    . 
.    .     .    . 
.    .     .    . 

是否有快速/乾淨的方法來做到這一點?

回答

1

串聯的ID與數據:np.hstack((a[:,0][::3][:,None],a[:,2].reshape(-1,3)))

例如:

a=np.array([[i//3,i%3-1,np.random.random()] for i in range (15)]) 
# a=a[np.argsort(a[:,1])][np.argsort(a[:,0])] #if not sorted 
print(a) 
id=a[::3,0][:,None] 
data =a[:,2].reshape(-1,3) 
print(np.hstack((id,data))) 

[[ 0.   -1.   0.78556868] 
[ 0.   0.   0.29483601] 
[ 0.   1.   0.74003482] 
[ 1.   -1.   0.00673232] 
[ 1.   0.   0.43262104] 
[ 1.   1.   0.92925208] 
[ 2.   -1.   0.26060377] 
[ 2.   0.   0.21186242] 
[ 2.   1.   0.88388227] 
[ 3.   -1.   0.53816376] 
[ 3.   0.   0.82545746] 
[ 3.   1.   0.53964188] 
[ 4.   -1.   0.63082784] 
[ 4.   0.   0.45693351] 
[ 4.   1.   0.38970428]] 

[[ 0.   0.78556868 0.29483601 0.74003482] 
[ 1.   0.00673232 0.43262104 0.92925208] 
[ 2.   0.26060377 0.21186242 0.88388227] 
[ 3.   0.53816376 0.82545746 0.53964188] 
[ 4.   0.63082784 0.45693351 0.38970428]] 

大熊貓可以給你很好的解決方案了。

+0

謝謝,但不幸的是,這不會工作,因爲類不是每個id都要相同(請看我提供的示例)! – Angelica

+0

您可以使用'x = x [np.argsort(x [:,1])]'然後'x = x [np.argsort(x [:,0])]''輕鬆地對數據進行排序。然後你有按ID和類排序的數據,並可以使用重塑。不過,如果你願意使用它,我認爲熊貓解決方案更簡潔清晰。 – kazemakase

+0

編輯:感謝@kazemakase。我添加了排序行。 –

3

這裏是大熊貓的解決方案:

import pandas as pd 
import numpy as np 

x = np.array([[ 0.00000000e+00, 0.00000000e+00, 3.71301945e-01], 
     [ 0.00000000e+00, 1.00000000e+00, 3.17619162e-01], 
     [ 0.00000000e+00, -1.00000000e+00, 3.11078923e-01], 
     [ 1.00000000e+00, 0.00000000e+00, 4.01434455e-01], 
     [ 1.00000000e+00, 1.00000000e+00, 3.16000976e-01], 
     [ 1.00000000e+00, -1.00000000e+00, 2.82564558e-01], 
     [ 2.00000000e+00, 1.00000000e+00, 3.61490457e-01], 
     [ 2.00000000e+00, 0.00000000e+00, 3.24832048e-01], 
     [ 2.00000000e+00, -1.00000000e+00, 3.13677513e-01]]) 

df = pd.DataFrame(x, columns=["id", "class", "p"]) 
df.pivot(index="id", columns="class", values="p") 

輸出:

class  -1   0   1 
id         
0  0.311079 0.371302 0.317619 
1  0.282565 0.401434 0.316001 
2  0.313678 0.324832 0.361490 
0

您還可以使用unstackpandas

與使用相同的DF @HYRY,添加:

df.set_index(["id","class"]).unstack("class").reset_index() 

結果:

 id   p      
class   -1.0  0.0  1.0 
0  0 0.311079 0.371302 0.317619 
1  1 0.282565 0.401434 0.316001 
2  2 0.313678 0.324832 0.361490