2013-05-16 43 views
-1

這是從閱讀csv文件輸入文件我有:樞轉重複列成行

Sample Info  D3S1358 1  D3S1358 2  TH01 1  TH01 2  D21S11 1  D21S11 2  D21S11 3 
TEST_646   17   17     9  9.3   28     28   nan 
TEST_647   18   18     7  7   29     30   30.2 
TEST_648   16   16     9  9   31.2    31.2  nan 

我想將它轉換成一種形式是這樣的:

Sample_name Marker  mrk  value 
TEST_646  D3S1358  1  17 
TEST_646  D3S1358  2  17 
TEST_646  TH01  1  9 
TEST_646  TH01  2  9.3 
TEST_646  D21S11  1  28.0 
TEST_646  D21S11  2  28.0 
TEST_646  D21S11  3  nan 

PS。這裏是逗號分隔形式的數值爲了您的方便:

Sample Info, D3S1358 1, D3S1358 2, TH01 1, TH01 2, D21S11 1, D21S11 2, D21S11 3 
TEST_646, 17, 17, 9, 9.3, 28, 28, nan 
TEST_647, 18, 18, 7, 7, 29, 30, 30.2 
TEST_648, 16, 16, 9, 9, 31.2, 31.2, nan 

我的解決辦法,到目前爲止是:

samples = xls.parse(sheet).set_index('Sample Info') 
cols = list(set(filter(None, [i[:-2] if i!="Sample Info" else None for i in samples.columns]))) 
sample_df_d= {'1' : pd.Series(len(cols)*[''], index=cols), '2' : pd.Series(len(cols)*[''], index=cols), '3' : pd.Series(len(cols)*[''], index=cols)} 
sample_df_ = pd.DataFrame(sample_df_d) 
sample_ser = sample_df_.stack() 
sample_df = pd.DataFrame(sample_ser, columns=['value']) 
#print sample_df 

for i,j in samples.iterrows(): 
    for i2,j2 in j.iteritems(): 
      print j[0], i2[:-2], "\t", i2[-2:],"\t", j2 

這會產生這樣的:

17 D3S1358 1 17 
17 D3S1358 2 17 
17 TH01  1 9 
17 TH01  2 9.3 
17 D21S11 1 28.0 
+2

問題出在哪裏?你嘗試了什麼? –

+0

噢,我嘗試了很多東西,比如放一些系列而不是其中的每一個,但是沒有安迪的MultiIndex解決方案沒有爲我工作。對不起,我沒有粗略的解決方案來解決問用我的「壞」解決方案更新了這個問題。 –

回答

5

這裏有一個方法與堆疊,首先清理列MultiIndex

In [11]: df_1 = df0.set_index('Sample Info') 

In [12]: df_1.columns = pd.MultiIndex.from_arrays(zip(*df_1.columns.map(str.split)), 
                names=['Marker', 'mrk']) 

In [13]: df_1 
Out[13]: 
Marker  D3S1358  TH01  D21S11 
mrk    1 2  1 2  1  2  3 
Sample Info 
TEST_646   17 17  9 9.3 28.0 28.0 NaN 
TEST_647   18 18  7 7.0 29.0 30.0 30.2 
TEST_648   16 16  9 9.0 31.2 31.2 NaN 

然後你可以stack(首先由'Marker'然後通過'mrk'):

In [14]: df_2 = df_1.stack(level=['Marker', 'mrk']) 

In [15]: df_2 
Sample Info Marker mrk 
TEST_646  D21S11 1  28.0 
         2  28.0 
      D3S1358 1  17.0 
         2  17.0 
      TH01  1  9.0 
         2  9.3 
TEST_647  D21S11 1  29.0 
         2  30.0 
         3  30.2 
      D3S1358 1  18.0 
         2  18.0 
      TH01  1  7.0 
         2  7.0 
TEST_648  D21S11 1  31.2 
         2  31.2 
      D3S1358 1  16.0 
         2  16.0 
      TH01  1  9.0 
         2  9.0 
dtype: float64 

然後,您可以reset_index如果你想讓它回到列:

df_2.reset_index() 
+0

如果可以的話,我會再次滿意!好的解決方案 – bdiamante