我可以按列分組並重新取樣日期嗎？

我有一個看起來像我可以按列分組並重新取樣日期嗎？

CustomerID InvoiceDate 
13654.0  2011-07-17 13:29:00 
14841.0  2010-12-16 10:28:00 
19543.0  2011-10-18 16:58:00 
12877.0  2011-06-15 13:34:00 
15073.0  2011-06-06 12:33:00

我感興趣的是，在該客戶購買的速度有些消費者購買數據。我想按每個客戶進行分組，然後確定每季度進行了多少次採購（假設每個季度是從一月份開始的每3個月）。

我可以定義每個季度的開始和結束時間，並製作另一列。我想知道是否可以用groupby來達到同樣的效果。

目前，這是我要做的事：

r = data.groupby('CustomerID') 

frames = [] 
for name,frame in r: 

    f =frame.set_index('InvoiceDate').resample("QS").count() 

    f['CustomerID']= name 

    frames.append(f) 


g = pd.concat(frames)

來源

2017-04-13 Demetri P

UPDATE：

In [43]: df.groupby(['CustomerID', pd.Grouper(key='InvoiceDate', freq='QS')]) \ 
      .size() \ 
      .reset_index(name='Count') 
Out[43]: 
    CustomerID InvoiceDate Count 
0  12877.0 2011-04-01  1 
1  13654.0 2011-07-01  1 
2  14841.0 2010-10-01  1 
3  15073.0 2011-04-01  1 
4  19543.0 2011-10-01  1

那是什麼喲你想？

In [39]: df.groupby(pd.Grouper(key='InvoiceDate', freq='QS')).count() 
Out[39]: 
      CustomerID 
InvoiceDate 
2010-10-01   1 
2011-01-01   0 
2011-04-01   2 
2011-07-01   1 
2011-10-01   1

來源

2017-04-13 16:56:06 MaxU

關閉。我也想按客戶ID進行分組。我找到了一個辦法，謝謝。 –

@DemetriP，我已經更新了答案 - 請檢查 – MaxU

好得多。謝謝！ –

我覺得這是最好的，我就能做到：

data.groupby('CustomerID').apply(lambda x: x.set_index('InvoiceDate').resample('QS').count())

來源

2017-04-13 16:53:10

使用pd.TimeGrouper

df = df.set_index('InvoiceDate') 
df.index = pd.to_datetime(df.index) 
df.groupby(['CustomerID',pd.TimeGrouper(freq='QS')]).size().reset_index().rename(columns={0:'Num_Invoices'}) 

CustomerID InvoiceDate Num_Invoices 
0  12877.0 2011-04-01  1 
1  13654.0 2011-07-01  1 
2  14841.0 2010-10-01  1 
3  15073.0 2011-04-01  1 
4  19543.0 2011-10-01  1

來源

2017-04-13 17:03:14

我可以按列分組並重新取樣日期嗎？

回答

相關問題