根據秩條件創建Groupby列Python

我正在使用python中的事件數據庫，並且我需要編寫一個函數來量化特定事件是否跟隨（AT ANY POINT）另一個特定事件。根據秩條件創建Groupby列Python

df = pd.DataFrame({'User':[1,1,1,2,2,2], 
       'Product':['A','A','A','B','B','B'], 
       'Updated_At':['2015-01-01', 
          '2015-02-01', 
          '2015-03-01', 
          '2015-04-01', 
          '2015-05-01', 
          '2015-06-01'], 
        'Event':[1,1,2,1,3,2]})

對於用戶具有的每個產品，做事件2在任何點後續事件1事件1，如果是下一個出現之前，繼續排在那裏事件= 1

答案（'Event_Updated 「包含的行我想繼續）：

df = pd.DataFrame({'User':[1,1,1,2,2,2], 
       'Product':['A','A','A','B','B','B'], 
       'Updated_At':['2015-01-01', 
          '2015-02-01', 
          '2015-03-01', 
          '2015-04-01', 
          '2015-05-01', 
          '2015-06-01'], 
       'Event':[1,1,2,1,3,2], 
       'Updated_Event':['no', 'yes', 'no', 'yes', 'no', 'no']})

合乎邏輯的步驟似乎是使用GROUPBY（保持[」用戶」，‘產品’]），並創建一個虛擬列添加到GROUPBY，然後檢查在User，Product，EventType1的每個實例中是否還存在Event = 2的行。類似於'Event_D ummy」欄下方：

df = pd.DataFrame({'User':[1,1,1,2,2,2], 
       'Product':['A','A','A','B','B','B'], 
       'Updated_At':['2015-01-01', 
          '2015-02-01', 
          '2015-03-01', 
          '2015-04-01', 
          '2015-05-01', 
          '2015-06-01'], 
       'Event':[1,1,2,1,3,2], 
       'Event_Dummy': [1,2,2,3,3,3], 
       'Updated_Event':['no', 'yes', 'no', 'yes', 'no', 'no']})

那麼該語句將沿着線服用點：

檢查，如果df.grouby('User','Product','Event_Dummy')包含2。

請讓我知道我可以幫助澄清問題。

來源

2015-12-08 user3892921

我想我不明白。你想創建列'updated_Event'嗎？或者是其他東西？我不明白'updated_Event'列中的第二個'是'。首先'是'是因爲它是第二次發生，或者不是？也許[this]（http://stackoverflow.com/help/mcve）有幫助。 – jezrael

我對此感到抱歉。是的，我想創建'Updated_Event'列。如果「事件」= 1，則updated_event應該只計算爲true，並且該事件在某個點由「事件」= 2（在另一個事件= 1之前）後跟。第一個「是」是因爲事件之後是事件2.第二個「是」是因爲事件之後是事件2（即使事件不是在事件= 1之後） – user3892921

我添加新列Updated_Event_new爲更好地與Updated_Event列比較：

print df     
    Event Product Updated_At Updated_Event User 
0  1  A 2015-01-01   no  1 
1  1  A 2015-02-01   yes  1 
2  2  A 2015-03-01   no  1 
3  1  B 2015-04-01   yes  2 
4  3  B 2015-05-01   no  2 
5  2  B 2015-06-01   no  2

#subset all rows with 1 or 2 in column Event 
df1 = df[(df['Event'] == 1) | (df['Event'] == 2)] 
print df1 
    Event Product Updated_At Updated_Event User 
0  1  A 2015-01-01   no  1 
1  1  A 2015-02-01   yes  1 
2  2  A 2015-03-01   no  1 
3  1  B 2015-04-01   yes  2 
5  2  B 2015-06-01   no  2

#select columns Event with 1, where previous rows is 2 and 
#create new column Updated_Event_new with value yes 
df1.loc[((df1['Event'] == 1) & (df1['Event'].shift(-1) == 2)) , 'Updated_Event_new'] = 'yes' 
print df1 
    Event Product Updated_At Updated_Event User Updated_Event_new 
0  1  A 2015-01-01   no  1    NaN 
1  1  A 2015-02-01   yes  1    yes 
2  2  A 2015-03-01   no  1    NaN 
3  1  B 2015-04-01   yes  2    yes 
5  2  B 2015-06-01   no  2    NaN

#subset not all rows with 1 or 2 in column Event 
df2 = df[~((df['Event'] == 1) | (df['Event'] == 2))] 
print df2 
    Event Product Updated_At Updated_Event User 
4  3  B 2015-05-01   no  2

#concat both subset - df1 and df2 to original df 
df = pd.concat([df1,df2]) 

#sort index 
df = df.sort_index() 

#fill NaN in Updated_Event_new by value no 
df['Updated_Event_new'] = df['Updated_Event_new'].fillna('no') 
print df 
    Event Product Updated_At Updated_Event Updated_Event_new User 
0  1  A 2015-01-01   no    no  1 
1  1  A 2015-02-01   yes    yes  1 
2  2  A 2015-03-01   no    no  1 
3  1  B 2015-04-01   yes    yes  2 
4  3  B 2015-05-01   no    no  2 
5  2  B 2015-06-01   no    no  2

來源

2015-12-08 22:20:15 jezrael

根據秩條件創建Groupby列Python

回答

相關問題