選擇從分組的對象

在給定範圍的行我有一個數據框看起來與此類似：選擇從分組的對象

+------------+---------------------+---------+ 
| action | ts     | uid | 
+------------+---------------------+---------+ 
| action1 | 2013-01-01 00:00:00 | 543534 | 
| action2 | 2013-01-01 00:00:00 | 543544 | 
| action1 | 2013-01-01 00:00:02 | 543542 | 
| action2 | 2013-01-01 00:00:03 | 543541 | 
| ....  |  ....   | ... | 
+------------+---------------------+---------+

我要計算每個用戶在特定TIMERANGE執行的每一個類型的actions數，所以預期輸出是不服這樣的：

uid action1 action2 
543534 10  1 
543534 0  2 
...

我想通過首先將.groupby('uid')然後通過分組的對象迭代來解決這個問題，在選擇的行然後ts在給定的範圍內，則串聯dataframes進入導致數據幀，分類

所以，水木清華這樣的：

df = ... 
start_date = ... 
end_date = ... 
result = {} 

grouped = df.groupby('uid') 
grouped_dict = dict(list(grouped)) 

for item in grouped.keys: 
    df = grouped[item]  
    result[item] = df[df.ts > start_date and df.ts < end_date].size()

我還沒有運行此代碼，但我認爲，即使它的工作原理是非常低效的。即使將分組對象轉換爲字典也需要很長時間。在這種情況下更有效的方法是什麼？

來源

2014-02-21 Timofey

你有沒有注意到你可以通過多個鍵實際分組？ –

如果你可以枚舉時間範圍，那麼你可以將這兩個分組。 –

您可以將兩通過uid和action：

start_date = pd.to_datetime('2013-01-01 00:00:00') 
end_date = pd.to_datetime('2013-01-01 00:00:07') 
print df 
print df[(df.ts > start_date) & (df.ts < end_date)].groupby(['uid','action'])['ts'].count().unstack('action').fillna(0)

輸出：

action     ts uid 
0 action1 2013-01-01 00:00:00 1 
1 action2 2013-01-01 00:00:00 2 
2 action1 2013-01-01 00:00:02 2 
3 action2 2013-01-01 00:00:03 1 
4 action2 2013-01-01 00:00:04 2 
5 action2 2013-01-01 00:00:05 1 
6 action1 2013-01-01 00:00:06 1 
action action1 action2 
uid      
1    1  2 
2    1  1

來源

2014-02-21 17:19:23

如果你有0.13，你可以通過執行'df.query（'@ start_date

這樣稍微更具可讀性哦，這很明顯！對我感到羞恥。謝謝！ – Timofey

綜觀pandas.DataFrame界面，我選擇的數據是這樣的：

# Select the interesting date range 
bydate = df[(df['ts'] > start_date & df.ts < end_date] 
# Now this will group for uid, *then* by action 
grouped = bydate.groupby(('uid', 'action'))

現在，讓我們只是打印的每UID動作的次數：

for indices, data in grouped: 
    print("Uid {}, Action '{}': {}".format(indices[0], indices[1], len(data))

來源

2014-02-21 17:18:29

'和'不適用於numpy數組的語法 –

錯誤...複製錯了。我在測試代碼中有'＆' –

如果你改變它，我會刪除我的downvote。 –

選擇從分組的對象

回答

相關問題