2017-09-19 51 views
0

我試圖獲取所有其他列時,我應用了一分鐘重採樣天數據框。當天採樣重採樣時獲取元素

ts value date diff 
date     
2017-09-18 05:40:00 1505706000000000 71 2017-09-18 05:40:00 NaN 
2017-09-18 05:30:00 1505705400000000 72 2017-09-18 05:30:00 1.0 
2017-09-18 05:20:00 1505704800000000 71 2017-09-18 05:20:00 -1.0 
2017-09-18 05:10:00 1505704200000000 73 2017-09-18 05:10:00 2.0 
2017-09-18 05:00:00 1505703600000000 72 2017-09-18 05:00:00 -1.0 
2017-09-18 04:50:00 1505703000000000 72 2017-09-18 04:50:00 0.0 
2017-09-18 04:40:00 1505702400000000 71 2017-09-18 04:40:00 -1.0 
2017-09-18 04:30:00 1505701800000000 71 2017-09-18 04:30:00 0.0 

我想做的事是的foreach日獲得的最小差異及其所有精度日期(不重新取樣)

,但是當我這樣做:

df['diff'].resample('D').min() 

我得到這個結果:

date 
2016-06-16  9.0 
2016-06-17 11.0 
2016-06-18 10.0 
2016-06-19  NaN 
2016-06-20 18.0 
2016-06-21  3.0 
2016-06-22  NaN 
2016-06-23  NaN 
2016-06-24  NaN 
2016-06-25  NaN 
2016-06-26  NaN 
2016-06-27 14.0 
2016-06-28  9.0 

結果想:

date 
2016-06-16  9.0 2016-06-16 07:10:00 
2016-06-17 11.0 2016-06-17 08:30:00 

任何想法如何得到它上面的結果?

+0

您的「結果」不在您提供的輸入中。 – asongtoruin

+0

是的,因爲這是一個樣本,我實際上無法把所有的數據放在這裏,因爲它是一年的時間,精度爲10分鐘,但是我想指出的是,我希望日期的分鐘和小時精度與重採樣日期一致。 – azelix

回答

1

考慮使用date_only字段相匹配的重採樣的索引輸出與原始數據幀得到的分鐘DIFF系列(澆鑄數據幀)的一個merge

下面還將顯示您的發佈示例,如果在同一天中的各小時/分鐘相同,則會生成多個數據記錄,分數相同爲diff值。

from io import StringIO 
import pandas as pd 

txt = '''  
date   ts value date2 diff  
"2017-09-18 05:40:00" 1505706000000000 71 "2017-09-18 05:40:00" None 
"2017-09-18 05:30:00" 1505705400000000 72 "2017-09-18 05:30:00" 1.0 
"2017-09-18 05:20:00" 1505704800000000 71 "2017-09-18 05:20:00" -1.0 
"2017-09-18 05:10:00" 1505704200000000 73 "2017-09-18 05:10:00" 2.0 
"2017-09-18 05:00:00" 1505703600000000 72 "2017-09-18 05:00:00" -1.0 
"2017-09-18 04:50:00" 1505703000000000 72 "2017-09-18 04:50:00" 0.0 
"2017-09-18 04:40:00" 1505702400000000 71 "2017-09-18 04:40:00" -1.0 
"2017-09-18 04:30:00" 1505701800000000 71 "2017-09-18 04:30:00" 0.0 
''' 

df = pd.read_table(StringIO(txt), sep="\s+", index_col=0, parse_dates=[0,3])\ 
        .rename(columns={'date2':'date'}) 
df['date_only'] = pd.to_datetime(df.index.to_series().dt.date) 

new_df = df['diff'].resample('D').min()\ 
      .to_frame()\ 
      .reset_index()\ 
      .merge(df, left_on=['date', 'diff'], right_on=['date_only', 'diff'], 
        suffixes=['','_'])[['date', 'diff', 'date_']]\ 
      .set_index('date')\ 
      .rename(columns={'date_':'date'}) 

print(new_df) 
#    diff    date 
# date         
# 2017-09-18 -1.0 2017-09-18 05:20:00 
# 2017-09-18 -1.0 2017-09-18 05:00:00 
# 2017-09-18 -1.0 2017-09-18 04:40:00