2016-06-18 37 views
2

我想轉換foll。數據:在Python中將值列表轉換爲時間序列

jan_1 jan_15 feb_1 feb_15 mar_1 mar_15 apr_1 apr_15 may_1 may_15 jun_1 jun_15 jul_1 jul_15 aug_1 aug_15 sep_1 sep_15 oct_1 oct_15 nov_1 nov_15 dec_1 dec_15 
0  0  0  0  0  1  1  2  2  2  2  2  2  3  3  3  3  3  0  0  0  0  0  0 

成陣列長度365的,其中重複的每個元素,直到下一個日期的天例如0從january 1重複january 15 ...

我可以做類似numpy.repeat,但事實並非日期意識到,這樣就不會考慮到不超過15天feb_15mar_1之間發生。

任何pythonic解決方案?

+0

你的問題不清楚,。 0,1,2如何確定並顯示你已厭倦。 – Merlin

回答

1

您可以使用resample

#add last value - 31 dec by value of last column of df 
df['dec_31'] = df.iloc[:,-1] 

#convert to datetime - see http://strftime.org/ 
df.columns = pd.to_datetime(df.columns, format='%b_%d') 

#transpose and resample by days 
df1 = df.T.resample('d').ffill() 
df1.columns = ['col'] 
print (df1) 
  col 
1900-01-01 0 
1900-01-02 0 
1900-01-03 0 
1900-01-04 0 
1900-01-05 0 
1900-01-06 0 
1900-01-07 0 
1900-01-08 0 
1900-01-09 0 
1900-01-10 0 
1900-01-11 0 
1900-01-12 0 
1900-01-13 0 
1900-01-14 0 
1900-01-15 0 
1900-01-16 0 
1900-01-17 0 
1900-01-18 0 
1900-01-19 0 
1900-01-20 0 
1900-01-21 0 
1900-01-22 0 
1900-01-23 0 
1900-01-24 0 
1900-01-25 0 
1900-01-26 0 
1900-01-27 0 
1900-01-28 0 
1900-01-29 0 
1900-01-30 0 
     .. 
1900-12-02 0 
1900-12-03 0 
1900-12-04 0 
1900-12-05 0 
1900-12-06 0 
1900-12-07 0 
1900-12-08 0 
1900-12-09 0 
1900-12-10 0 
1900-12-11 0 
1900-12-12 0 
1900-12-13 0 
1900-12-14 0 
1900-12-15 0 
1900-12-16 0 
1900-12-17 0 
1900-12-18 0 
1900-12-19 0 
1900-12-20 0 
1900-12-21 0 
1900-12-22 0 
1900-12-23 0 
1900-12-24 0 
1900-12-25 0 
1900-12-26 0 
1900-12-27 0 
1900-12-28 0 
1900-12-29 0 
1900-12-30 0 
1900-12-31 0 

[365 rows x 1 columns] 
#if need serie 
print (df1.col) 
1900-01-01 0 
1900-01-02 0 
1900-01-03 0 
1900-01-04 0 
1900-01-05 0 
1900-01-06 0 
1900-01-07 0 
1900-01-08 0 
1900-01-09 0 
1900-01-10 0 
1900-01-11 0 
1900-01-12 0 
1900-01-13 0 
1900-01-14 0 
1900-01-15 0 
1900-01-16 0 
1900-01-17 0 
1900-01-18 0 
1900-01-19 0 
1900-01-20 0 
1900-01-21 0 
1900-01-22 0 
1900-01-23 0 
1900-01-24 0 
1900-01-25 0 
1900-01-26 0 
1900-01-27 0 
1900-01-28 0 
1900-01-29 0 
1900-01-30 0 
      .. 
1900-12-02 0 
1900-12-03 0 
1900-12-04 0 
1900-12-05 0 
1900-12-06 0 
1900-12-07 0 
1900-12-08 0 
1900-12-09 0 
1900-12-10 0 
1900-12-11 0 
1900-12-12 0 
1900-12-13 0 
1900-12-14 0 
1900-12-15 0 
1900-12-16 0 
1900-12-17 0 
1900-12-18 0 
1900-12-19 0 
1900-12-20 0 
1900-12-21 0 
1900-12-22 0 
1900-12-23 0 
1900-12-24 0 
1900-12-25 0 
1900-12-26 0 
1900-12-27 0 
1900-12-28 0 
1900-12-29 0 
1900-12-30 0 
1900-12-31 0 
Freq: D, Name: col, dtype: int64 
#transpose and convert to numpy array 
print (df1.T.values) 
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 
    1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 
    2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 
    2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 
    2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 
    3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 
    3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]] 
2

IIUC你能做到這樣:

In [194]: %paste 
# transpose DF, rename columns 
x = df.T.reset_index().rename(columns={'index':'date', 0:'val'}) 
# parse dates 
x['date'] = pd.to_datetime(x['date'], format='%b_%d') 
# group resampled DF by the month and resample(`D`) each group 
result = (x.groupby(x['date'].dt.month) 
      .apply(lambda x: x.set_index('date').resample('1D').ffill())) 
# rename index names 
result.index.names = ['month','date'] 
## -- End pasted text -- 

In [212]: result 
Out[212]: 
        val 
month date 
1  1900-01-01 0 
     1900-01-02 0 
     1900-01-03 0 
     1900-01-04 0 
     1900-01-05 0 
     1900-01-06 0 
     1900-01-07 0 
     1900-01-08 0 
     1900-01-09 0 
     1900-01-10 0 
     1900-01-11 0 
     1900-01-12 0 
     1900-01-13 0 
     1900-01-14 0 
     1900-01-15 0 
2  1900-02-01 0 
     1900-02-02 0 
     1900-02-03 0 
     1900-02-04 0 
     1900-02-05 0 
     1900-02-06 0 
     1900-02-07 0 
     1900-02-08 0 
     1900-02-09 0 
     1900-02-10 0 
     1900-02-11 0 
     1900-02-12 0 
     1900-02-13 0 
     1900-02-14 0 
     1900-02-15 0 
...    ... 
11 1900-11-01 0 
     1900-11-02 0 
     1900-11-03 0 
     1900-11-04 0 
     1900-11-05 0 
     1900-11-06 0 
     1900-11-07 0 
     1900-11-08 0 
     1900-11-09 0 
     1900-11-10 0 
     1900-11-11 0 
     1900-11-12 0 
     1900-11-13 0 
     1900-11-14 0 
     1900-11-15 0 
12 1900-12-01 0 
     1900-12-02 0 
     1900-12-03 0 
     1900-12-04 0 
     1900-12-05 0 
     1900-12-06 0 
     1900-12-07 0 
     1900-12-08 0 
     1900-12-09 0 
     1900-12-10 0 
     1900-12-11 0 
     1900-12-12 0 
     1900-12-13 0 
     1900-12-14 0 
     1900-12-15 0 

[180 rows x 1 columns] 

或使用reset_index()

In [213]: result.reset_index().head(20) 
Out[213]: 
    month  date val 
0  1 1900-01-01 0 
1  1 1900-01-02 0 
2  1 1900-01-03 0 
3  1 1900-01-04 0 
4  1 1900-01-05 0 
5  1 1900-01-06 0 
6  1 1900-01-07 0 
7  1 1900-01-08 0 
8  1 1900-01-09 0 
9  1 1900-01-10 0 
10  1 1900-01-11 0 
11  1 1900-01-12 0 
12  1 1900-01-13 0 
13  1 1900-01-14 0 
14  1 1900-01-15 0 
15  2 1900-02-01 0 
16  2 1900-02-02 0 
17  2 1900-02-03 0 
18  2 1900-02-04 0 
19  2 1900-02-05 0 
+0

我認爲輸出的長度不是'365',因爲OP需要。 – jezrael

+0

@jezrael,我不這麼認爲... OP說:_其中每個元素重複,直到下一個日期天,例如, '0從1月1日到1月15日重複'__ – MaxU

+0

好吧,但是這個句子是以''開頭的長度爲365'的數組開始的......沒問題,如果你是對的,你的回答就會被接受。 – jezrael

相關問題