蟒蛇大熊貓.filter（）使用布爾面具

我有一個數據幀（z）的方法，看起來像這樣：蟒蛇大熊貓.filter（）使用布爾面具

timestamp     source price 
2004-01-05 14:55:09+00:00 Bank1 420.975 
2004-01-05 14:55:10+00:00 Bank2 421.0 
2004-01-05 14:55:22+00:00 Bank1 421.075 
2004-01-05 14:55:34+00:00 Bank1 420.975 
2004-01-05 14:55:39+00:00 Bank1 421.175 
2004-01-05 14:55:45+00:00 Bank1 421.075 
2004-01-05 14:55:52+00:00 Bank1 421.175 
2004-01-05 14:56:12+00:00 Bank2 421.1 
2004-01-05 14:56:33+00:00 Bank1 421.275

有時候，有時間窗，其中銀行2只提交1報價 - 我需要拋出因爲我需要銀行2個或更多的報價。如果銀行2出現1次或更少次數，請拋出一天。

我已經通過創建從我計劃，以篩選出符合條件的所有天布爾掩模，這樣的：

r = z.groupby([z.index.date, z['source']]).size() > 1 
    # return boolean for each day/source if it appears at least once 
r = r.groupby(level=0).all() == True 
    # ie. if the datetime 0th-level index contains all True, return True, otherwise False (meaning one source failed the criteria)

這產生了：

2004-01-05 True 
2004-01-06 True 
2004-01-07 True 
2004-01-08 False 
2004-01-09 True

完美。現在我只需要從原始數據幀z中過濾它，同時保持原始結構（即第二級頻率，而不是每天）。這意味着使用df.filter（）方法。

我的原始數據幀具有相同的結構（以及它們的.shape [0]的是相同的）：

2004-01-05 94 
2004-01-06 24 
2004-01-07 62 
2004-01-08 30 
2004-01-09 36

大。

這裏是我困惑的地方。我運行：

t = y.groupby(y.index.date).filter(lambda x: [x for x in r])

並收到TypeError: filter function returned a list, but expected a scalar bool。

基本上，我需要lambda函數只需返回r中的每個x（布爾值）。

我解決了這個一個非常令人費解，而不是（只取前我解決了整個事情，不要把它扔進一個r變量，而是讓它成爲lambda功能的一部分）。

t = y.groupby(y.index.date).filter(lambda x: (x.groupby([x.index.date, x['source']]).size() > 1).groupby(level=0).all() == True) # ie. the datetime 0th-level index

這是超級亂，必須有說基本途徑，這是我的數據框z，然後groupby('z.index.date')，然後.filter()基於布爾面具r。

編輯：這是我從熊貓教程中找到的，但我出於某種原因，.between_time（）部分不起作用。它濾除長度爲< = 1的所有內容，而不僅僅在.between_time（）條件爲真時。

t = y.groupby([y.index.date, y['source']]).filter(lambda x: len(x.between_time('14:00','15:00') > 1)

來源

2015-04-16 Alex Petralia

您建議的原來的做法是正確的，但你必須使用的一組transform（由date和source），而不是一個apply。 transform返回具有與原始數據幀相同結構的組信息。

grp = z.groupby([z.index.date,z.source]) 
counts = grp.transform('count') #counts the records for each group and index the information with the same structure of z 

filtered_z = z[counts > 1] #final filtering

來源

2015-04-16 21:41:58 Acorbe

我不知道如何在布爾序列上使用'count' ike'r'。我只需要''''中的引號來表示'r'中給出的日期。 –

我想我想通了這一點的日期：

只有在數據幀創建日期的新列z

z['date'] = z.index.date

然後不停的日子是在布爾系列r

z[z['date'].isin(r.index)]

來源

2015-04-20 18:27:23

蟒蛇大熊貓.filter（）使用布爾面具

回答

相關問題