Python熊貓數據框：圍繞關鍵日期進行過濾

我有一個熊貓日期框df，索引爲每日DatetimeIndex，附加列爲historical_sales。Python熊貓數據框：圍繞關鍵日期進行過濾

如果我們想要過濾天過去在那裏historical_sales較大量更大，說200，足夠簡單：

df.loc[df['historical_sales'>200]]

但是我想知道，如果我們想要在探索銷售模式在銷售額> 200的前幾天和之後5天？

非常感謝。

來源

2017-06-13 Federico Han

我認爲你需要通過列表理解來獲取所有索引值，然後通過loc進行選擇。

也有必要使用numpy.concatenate加入所有索引連同numpy.unique刪除重複項。

np.random.seed(100) 
rng = pd.date_range('2017-04-03', periods=20) 
df = pd.DataFrame({'historical_sales': np.random.choice([100,200,300], size=20)}, index=rng) 
print (df) 
      historical_sales 
2017-04-03    100 
2017-04-04    100 
2017-04-05    100 
2017-04-06    300 
2017-04-07    300 
2017-04-08    100 
2017-04-09    300 
2017-04-10    200 
2017-04-11    300 
2017-04-12    300 
2017-04-13    300 
2017-04-14    300 
2017-04-15    200 
2017-04-16    100 
2017-04-17    100 
2017-04-18    100 
2017-04-19    100 
2017-04-20    300 
2017-04-21    100 
2017-04-22    200

idxmask = df.index[df['historical_sales']>200] 
print (idxmask) 
DatetimeIndex(['2017-04-06', '2017-04-07', '2017-04-09', '2017-04-11', 
       '2017-04-12', '2017-04-13', '2017-04-14', '2017-04-20'], 
       dtype='datetime64[ns]', freq=None) 

#in real data change 1 to 5 for 5 days 
temp_index = [df.loc[timestamp - pd.Timedelta(1, unit='d') : 
        timestamp + pd.Timedelta(1, unit='d')].index for timestamp in idxmask] 
idx = np.unique(np.concatenate(temp_index)) 

df1 = df.loc[idx] 
print (df1) 
      historical_sales 
2017-04-05    100 
2017-04-06    300 
2017-04-07    300 
2017-04-08    100 
2017-04-09    300 
2017-04-10    200 
2017-04-11    300 
2017-04-12    300 
2017-04-13    300 
2017-04-14    300 
2017-04-15    200 
2017-04-19    100 
2017-04-20    300 
2017-04-21    100

來源

2017-06-13 05:36:21 jezrael

工作就像一個奇蹟，非常感謝！只有那裏有一個錯字，我認爲：面膜應該是idxmask。 –

對不起，我改正了。如果我的回答很有幫助，請不要忘記[接受]（http://meta.stackexchange.com/a/5235/295067）它。謝謝.. – jezrael

你會想這樣做範圍切片：http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-position

應該是這樣的（代碼是僞代碼）：

great_sales_df = df.loc[df['historical_sales'>200]] 
for sale in great_sales_df: 
    sales_date = great_sales_df["date"] 
    sales_before = sales_date + pd.DateOffset(-5) 
    sales_after = sales_date + pd.DateOffset(+5) 
    pattern_df = df.iloc[sales_before:sales_after]

此代碼將無法正常工作，但我認爲方向是對的。

來源

2017-06-13 05:21:58

爲了清楚起見，我將其設置爲1，感興趣的排new列。而對於容易驗證窗口的日期數量一直保持在1而不是5，在下面的代碼

import pandas as pd 
import numpy as np 
from datetime import datetime, timedelta 

df = pd.DataFrame(data=np.random.rand(51),index=pd.date_range('2015-04-20','2015-06-09'),columns=['A']) 
idx = df[df.A >0.5].index 

df["new"] = 0 

for date in idx: 
    current_date = date.to_pydatetime() 
    start = current_date - timedelta(days=1) 
    end = current_date + timedelta(days=1) 

    df.loc[start:current_date]["new"] = 1 
    df.loc[current_date:end]["new"] = 1 


print(df)

來源

2017-06-13 06:02:17 Abhishek

當我需要之前和之後與行的工作，我只是進行換檔。

df['preceeding_5th_day'] = df['historical_sales'].shift(5) 
df['following_5th_day'] = df['historical_sales'].shift(-5)

然後，你可以簡單地讓你檢查，並做

df.loc[df['historical_sales'>200]]

選定行隨後也將擁有約前述並按照第5天列。這種方式非常簡單。

來源

2017-06-13 06:06:04

Python熊貓數據框：圍繞關鍵日期進行過濾

回答

相關問題