2017-03-16 63 views
2

我有以下數據框:使用索引值作爲大熊貓類別值數據框中

    beat1 beat2 beat3 beat4 beat5 beat6 beat7 
filename                 
M40_HC_503d.dat 0.7456 0.8574 0.7695 0.8698 0.8315 0.7908 0.8823 
M30_HC_461d.dat 0.7672 0.6682 0.7452 0.6853 0.7488 0.6782 0.6648 
M24_HC_459d.dat 0.6041 0.6439 0.5870 0.7452 0.6714 0.6684 0.6198 
M48_HC_543d.dat 0.8949 0.8570 0.9338 1.0545 1.0681 1.0775 0.8425 
M40_HC_506d.dat 0.7862 0.8917 0.9357 0.8250 0.8521 0.7146 0.7125 

我不想再拍數據幀中的列名beat1beat7將索引以及將有兩列。在該數據框的第一列中,值將是從beat1beat7的所有值,並且第二列將是值來自的filename。這樣的事情:

values filename 
ind 
0 0.7456 M40_HC_503d.dat 
1 0.8574 M40_HC_503d.dat 
2 0.7695 M40_HC_503d.dat 
3 0.8698 M40_HC_503d.dat 
4 0.8315 M40_HC_503d.dat 
5 0.7908 M40_HC_503d.dat 
6 0.8823 M40_HC_503d.dat 
7 0.7672 M30_HC_461d.dat 
8 0.6682 M30_HC_461d.dat 
9 0.7452 M30_HC_461d.dat 
10 0.6853 M30_HC_461d.dat 
11 0.7488 M30_HC_461d.dat 
12 0.6782 M30_HC_461d.dat 
13 0.6648 M30_HC_461d.dat 

我嘗試了很多事情,包括採取轉置等等,但沒有爲我工作。有任何想法嗎?

回答

2
v = df.values 
i = df.index.values 

pd.DataFrame(
    np.hstack([v.reshape(-1, 1), i.repeat(v.shape[1])[:, None]]), 
    columns=['values', 'filename'] 
) 

    values   filename 
0 0.7456 M40_HC_503d.dat 
1 0.8574 M40_HC_503d.dat 
2 0.7695 M40_HC_503d.dat 
3 0.8698 M40_HC_503d.dat 
4 0.8315 M40_HC_503d.dat 
5 0.7908 M40_HC_503d.dat 
6 0.8823 M40_HC_503d.dat 
7 0.7672 M30_HC_461d.dat 
8 0.6682 M30_HC_461d.dat 
9 0.7452 M30_HC_461d.dat 
... 
2

我想你需要stack

df = df.stack().reset_index(0, name='values') 
print (df) 
       filename values 
beat1 M40_HC_503d.dat 0.7456 
beat2 M40_HC_503d.dat 0.8574 
beat3 M40_HC_503d.dat 0.7695 
beat4 M40_HC_503d.dat 0.8698 
beat5 M40_HC_503d.dat 0.8315 
beat6 M40_HC_503d.dat 0.7908 
beat7 M40_HC_503d.dat 0.8823 
beat1 M30_HC_461d.dat 0.7672 
beat2 M30_HC_461d.dat 0.6682 
beat3 M30_HC_461d.dat 0.7452 
beat4 M30_HC_461d.dat 0.6853 
beat5 M30_HC_461d.dat 0.7488 
beat6 M30_HC_461d.dat 0.6782 
... 

或許:

df = df.stack().reset_index(0, name='values').reset_index(drop=True) 
print (df) 
      filename values 
0 M40_HC_503d.dat 0.7456 
1 M40_HC_503d.dat 0.8574 
2 M40_HC_503d.dat 0.7695 
3 M40_HC_503d.dat 0.8698 
4 M40_HC_503d.dat 0.8315 
5 M40_HC_503d.dat 0.7908 
6 M40_HC_503d.dat 0.8823 
7 M30_HC_461d.dat 0.7672 
8 M30_HC_461d.dat 0.6682 
9 M30_HC_461d.dat 0.7452 
10 M30_HC_461d.dat 0.6853 
... 
... 

如果需要更改指數:

df = df.stack().reset_index(0, name='values') 
df.index = df.index.str.extract('(\d+)', expand=False) 
print (df) 
      filename values 
1 M40_HC_503d.dat 0.7456 
2 M40_HC_503d.dat 0.8574 
3 M40_HC_503d.dat 0.7695 
4 M40_HC_503d.dat 0.8698 
5 M40_HC_503d.dat 0.8315 
6 M40_HC_503d.dat 0.7908 
7 M40_HC_503d.dat 0.8823 
1 M30_HC_461d.dat 0.7672 
2 M30_HC_461d.dat 0.6682 
... 
... 
+0

爲我工作。謝謝 – Peaceful