2017-02-10 64 views
1

我有一個由日期時間戳索引的熊貓數據幀data_ask_bid,我只想保留日期範圍內的行:星期一@ 00:00 - 星期五@ 21:59。對於這一點,我寫了下面一行:使用多個條件的大熊貓邏輯索引

data_ask_bid = data_ask_bid[((0 <= data_ask_bid.index.weekday <= 3) | (data_ask_bid.index.weekday == 4 & data_ask_bid.index.hour < 22))] 

雖然有似乎是邏輯索引的一個問題,因爲它引發錯誤「的陣列與一個以上的元素的真值是模糊的。使用a.any()或a.all()'。代碼中哪裏出錯了?

+2

添加額外的括號:'(data_ask_bid.index。 ((data_ask_bid.index.weekday == 4)&(data_ask_bid.index.hour <22))' – EdChum

回答

1

我想你可以使用校驗值numpy.in1d

mask1 = np.in1d(data_ask_bid.index.weekday, [0,1,2,3]) 
mask2 = data_ask_bid.index.weekday == 4 
mask3 = data_ask_bid.index.hour < 22 

mask = mask1 | (mask2 & mask3) 

data_ask_bid = data_ask_bid[mask] 

樣品:

start = pd.to_datetime('2017-02-10 15:00:00') 
rng = pd.date_range(start, periods=20, freq='7h') 

data_ask_bid = pd.DataFrame({'a': range(20)}, index=rng) 
#print (data_ask_bid) 

w = data_ask_bid.index.weekday 
mask1 = np.in1d(w, [0,1,2,3]) 
mask2 = w == 4 
mask3 = data_ask_bid.index.hour < 22 

mask = mask1 | (mask2 & mask3) 
print (mask) 
[ True False False False False False False False False True True True 
    True True True True True True True True] 

data_ask_bid = data_ask_bid[mask] 
print (data_ask_bid) 
         a 
2017-02-10 15:00:00 0 
2017-02-13 06:00:00 9 
2017-02-13 13:00:00 10 
2017-02-13 20:00:00 11 
2017-02-14 03:00:00 12 
2017-02-14 10:00:00 13 
2017-02-14 17:00:00 14 
2017-02-15 00:00:00 15 
2017-02-15 07:00:00 16 
2017-02-15 14:00:00 17 
2017-02-15 21:00:00 18 
2017-02-16 04:00:00 19 

時序

start = pd.to_datetime('2017-02-10 15:00:00') 
N = 1000000 
rng = pd.date_range(start, periods=N, freq='H') 

data_ask_bid = pd.DataFrame({'a': range(N)}, index=rng) 
print (data_ask_bid) 

def jez(data_ask_bid): 
    w = data_ask_bid.index.weekday 
    mask1 = np.in1d(w, [0,1,2,3]) 
    mask2 = w == 4 
    mask3 = data_ask_bid.index.hour < 22 
    data_ask_bid = data_ask_bid[mask1 | (mask2 & mask3)] 
    return (data_ask_bid) 

print (jez(data_ask_bid)) 

print (data_ask_bid[(((data_ask_bid.index.weekday >= 0) & (data_ask_bid.index.weekday <= 3)) | ((data_ask_bid.index.weekday == 4) & (data_ask_bid.index.hour < 22)))]) 
In [273]: %timeit (jez(data_ask_bid)) 
10 loops, best of 3: 142 ms per loop 

In [274]: %timeit (data_ask_bid[(((data_ask_bid.index.weekday >= 0) & (data_ask_bid.index.weekday <= 3)) | ((data_ask_bid.index.weekday == 4) & (data_ask_bid.index.hour < 22)))]) 
1 loop, best of 3: 267 ms per loop 
0

剛剛發現,大熊貓不0 <= data_ask_bid.index.weekday <= 3所以我需要把它分成2項獨立的條款類型的從句工作,它的工作:

data_ask_bid = data_ask_bid[(((data_ask_bid.index.weekday >= 0) & (data_ask_bid.index.weekday <= 3)) | ((data_ask_bid.index.weekday == 4) & (data_ask_bid.index.hour < 22)))] 
+0

您的解決方案也可以完美工作,但它是一個在較大的DataFrame中速度較慢。另外在我看來更好的是分別爲更好和更易讀的代碼創建掩碼。祝你好運! – jezrael