2017-08-14 44 views
0

我有一個.csv我試圖讀入一個有多列列標題的熊貓數據框,但第一行的標籤是稀疏的。使用熊貓閱讀帶稀疏標籤的列標題的CSV

例如:

Binned_average_and_predicted_H2O_spectra_sorted_by_RH-class.,,,,,,,, 
,RH=0.8,,,,RH=0.9,,, 
,n_=_60,,,,n_=_29,,, 
nat_freq,avrg_sp(T),avrg_sp(h2o),denoised_avrg_sp(h2o),pred_sp(h2o),avrg_sp(T),avrg_sp(h2o),denoised_avrg_sp(h2o),pred_sp(h2o) 
6.10E-04,8.40E-02,0.117551351,0.117550357,8.64E-02,0.128696811,0.163304381,0.163304015,0.127552704 
1.22E-03,7.49E-02,0.126467592,0.126465605,7.70E-02,9.05E-02,0.200350295,0.200349563,8.97E-02 
1.83E-03,7.54E-02,0.124370072,0.124367091,7.76E-02,8.54E-02,0.121274897,0.121273799,8.46E-02 
2.44E-03,7.76E-02,0.136590839,0.136586865,7.99E-02,5.45E-02,0.100995665,0.100994202,5.40E-02 
3.05E-03,8.73E-02,0.141422799,0.141417832,8.98E-02,7.57E-02,0.170033442,0.170031614,7.50E-02 
3.66E-03,7.29E-02,0.143599074,0.143593115,7.50E-02,0.10001777,0.165468366,0.165466173,9.91E-02 

當我讀了CSV,

Cosp2 = pd.read_csv(DPath,index_col=0, header=[1,3]) 
print(Cosp2) 

我結束了無名:對所有的頭第一級標頭都沒有明確標註#_level_0標籤。

   RH=0.8 Unnamed: 2_level_0 Unnamed: 3_level_0 \ 
nat_freq avrg_sp(T)  avrg_sp(h2o) denoised_avrg_sp(h2o) 
0.00061  0.0840   0.117551    0.117550 
0.00122  0.0749   0.126468    0.126466 
0.00183  0.0754   0.124370    0.124367 
0.00244  0.0776   0.136591    0.136587 
0.00305  0.0873   0.141423    0.141418 
0.00366  0.0729   0.143599    0.143593 

     Unnamed: 4_level_0  RH=0.9 Unnamed: 6_level_0 \ 
nat_freq  pred_sp(h2o) avrg_sp(T)  avrg_sp(h2o) 
0.00061    0.0864 0.128697   0.163304 
0.00122    0.0770 0.090500   0.200350 
0.00183    0.0776 0.085400   0.121275 
0.00244    0.0799 0.054500   0.100996 
0.00305    0.0898 0.075700   0.170033 
0.00366    0.0750 0.100018   0.165468 

      Unnamed: 7_level_0 Unnamed: 8_level_0 
nat_freq denoised_avrg_sp(h2o)  pred_sp(h2o) 
0.00061    0.163304   0.127553 
0.00122    0.200350   0.089700 
0.00183    0.121274   0.084600 
0.00244    0.100994   0.054000 
0.00305    0.170032   0.075000 
0.00366    0.165466   0.099100 

有沒有辦法讓熊貓在整個未標記的列上傳播0級標籤?我想的東西,看起來像這樣:

   RH=0.8             \ 
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o) 
0.00061  0.0840  0.117551    0.117550  0.0864 
0.00122  0.0749  0.126468    0.126466  0.0770 
0.00183  0.0754  0.124370    0.124367  0.0776 
0.00244  0.0776  0.136591    0.136587  0.0799 
0.00305  0.0873  0.141423    0.141418  0.0898 
0.00366  0.0729  0.143599    0.143593  0.0750 

      RH=0.9             
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o) 
0.00061 0.128697  0.163304    0.163304  0.127553 
0.00122 0.090500  0.200350    0.200350  0.089700 
0.00183 0.085400  0.121275    0.121274  0.084600 
0.00244 0.054500  0.100996    0.100994  0.054000 
0.00305 0.075700  0.170033    0.170032  0.075000 
0.00366 0.100018  0.165468    0.165466  0.099100 

回答

1

您可以使用get_level_valuesto_seriesSeries第一:

a = Cosp2.columns.get_level_values(0).to_series() 
print (a) 
RH=0.8       RH=0.8 
Unnamed: 2_level_0 Unnamed: 2_level_0 
Unnamed: 3_level_0 Unnamed: 3_level_0 
Unnamed: 4_level_0 Unnamed: 4_level_0 
RH=0.9       RH=0.9 
Unnamed: 6_level_0 Unnamed: 6_level_0 
Unnamed: 7_level_0 Unnamed: 7_level_0 
Unnamed: 8_level_0 Unnamed: 8_level_0 
dtype: object 

如果startswithUnnamed然後使用maskNaN s,不ffill更換NaNfillnamethod='ffill'

b = a.mask(a.str.startswith('Unnamed')).ffill() 
print (b) 
RH=0.8    RH=0.8 
Unnamed: 2_level_0 RH=0.8 
Unnamed: 3_level_0 RH=0.8 
Unnamed: 4_level_0 RH=0.8 
RH=0.9    RH=0.9 
Unnamed: 6_level_0 RH=0.9 
Unnamed: 7_level_0 RH=0.9 
Unnamed: 8_level_0 RH=0.9 
dtype: object 

末創建新MultiIndex通過from_arrays

Cosp2.columns = pd.MultiIndex.from_arrays([b, Cosp2.columns.get_level_values(1)]) 
print (Cosp2) 
      RH=0.8             \ 
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o) 
0.00061  0.0840  0.117551    0.117550  0.0864 
0.00122  0.0749  0.126468    0.126466  0.0770 
0.00183  0.0754  0.124370    0.124367  0.0776 
0.00244  0.0776  0.136591    0.136587  0.0799 
0.00305  0.0873  0.141423    0.141418  0.0898 
0.00366  0.0729  0.143599    0.143593  0.0750 

      RH=0.9             
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o) 
0.00061 0.128697  0.163304    0.163304  0.127553 
0.00122 0.090500  0.200350    0.200350  0.089700 
0.00183 0.085400  0.121275    0.121274  0.084600 
0.00244 0.054500  0.100996    0.100994  0.054000 
0.00305 0.075700  0.170033    0.170032  0.075000 
0.00366 0.100018  0.165468    0.165466  0.099100