獲取連續行之間的營業日大熊貓

我知道：

df.diff()

給我的天，我知道我可以做某種魔力與

df.loc[df.Date.weekday == 4, 'Diff'] = 1

但是，這是不是最佳的。我試過

np.busday_count()

但是我得到一個錯誤我不太明白。以下是帶有該錯誤的示例代碼：

In [36]: df = pd.DataFrame.from_dict({1: {'Date': '2016-01-01'}, 2: {'Date': '2016-01-02'}, 3: {'Date': '2016-01-03'}}, orient='index') 

In [37]: df['Date'] = df.Date.astype('<M8[D]') 

In [38]: np.busday_count(df.Date, df.Date.shift(1)) 
--------------------------------------------------------------------------- 
TypeError         Traceback (most recent call last) 
<ipython-input-38-07a4ae9a16f6> in <module>() 
----> 1 np.busday_count(df.Date, df.Date.shift(1)) 

TypeError: Iterator operand 0 dtype could not be cast from dtype('<M8[ns]') to dtype('<M8[D]') according to the rule 'safe' 

In [39]: df = pd.DataFrame.from_dict({1: {'Date': '2016-01-01'}, 2: {'Date': '2016-01-02'}, 3: {'Date': '2016-01-03'}}, orient='index') 

In [40]: np.busday_count(df.Date, df.Date.shift(1)) 
--------------------------------------------------------------------------- 
TypeError         Traceback (most recent call last) 
<ipython-input-40-07a4ae9a16f6> in <module>() 
----> 1 np.busday_count(df.Date, df.Date.shift(1)) 

TypeError: Iterator operand or requested dtype holds references, but the REFS_OK flag was not enabled

來源

2016-04-08 user1610719

知道了！

所以我不知道這是否符合每個人的需要，但這個工程：

np.busday_count(df.Date.values.tolist(), df.Date.shift(1).fillna(df.Date).values.tolist())

在tolist（）因此增加，以及.fillna（）的部分都是必要的！

來源

2016-04-08 17:55:04 user1610719

有明確的類型轉換一個乾淨的解決方案，以NumPy的D型'datetime64 [d]'作品無需轉換列表：'np.busday_count（df.Date.values.astype（'datetime64 [D]'），df.Date.shift（1）.fillna（df.Date）.values.astype（'datetime64 [D]'））'。因此，這應該更快，但它不太可讀。我可能會編寫一個函數「my_busday_count」，其中所有這些轉換在pd.Series/np.vectors上一致地工作，並在適當的情況下正確輸出NaT。 – hynekcer

隨着np.busday_count，你可以嘗試，以及：

x3 = [x.strftime('%Y-%m-%d') for x in df.Date] 
x4 = [x.strftime('%Y-%m-%d') for x in df.Date.shift(1).fillna(0)] 
np.busday_count(x4,x3) 
array([12001,  1,  0]) 

%timeit np.busday_count(x4,x3) 
The slowest run took 4.58 times longer than the fastest. This could mean that an intermediate result is being cached. 
100000 loops, best of 3: 12.5 µs per loop

，或者您希望：

x1 = [x.date() for x in df.Date] 
x2 = [x.date() for x in df.Date.shift(1).fillna(0)] 
np.busday_count(x2,x1) 
array([12001,  1,  0]) 

%timeit np.busday_count(x2,x1) 
10000 loops, best of 3: 43.4 µs per loop

來源

2016-04-08 18:15:25

獲取連續行之間的營業日大熊貓

回答

相關問題