2016-11-10 56 views
0

我想在底部切片樣本數據...如何使用熊貓選擇兩個值之間的一系列行?

讓每個會話(會話 - 從登錄事件通過下次登錄前動作),看起來像這樣:

login,4,2016-11-10 05:28:30.396,hbhimani,11/10/2016 
getUserPreferences,179,2016-11-10 05:28:30.575,hbhimani,11/10/2016 
getPreference,3,2016-11-10 05:28:55.686,hbhimani,11/10/2016 
getPreference,4,2016-11-10 05:28:55.961,hbhimani,11/10/2016 
constructFromSession,4,2016-11-10 05:28:56.108,hbhimani,11/10/2016 
getUserPreferences,4,2016-11-10 05:28:56.112,hbhimani,11/10/2016 
getUserPreferences,3,2016-11-10 05:28:56.116,hbhimani,11/10/2016 
setBooleanPreference,4,2016-11-10 05:28:56.238,hbhimani,11/10/2016 
setBooleanPreference,4,2016-11-10 05:28:56.513,hbhimani,11/10/2016 
getQuickSearchInitInfo,3,2016-11-10 05:28:58.936,hbhimani,11/10/2016 
getQuickSearchInitInfo2,4,2016-11-10 05:28:59.315,hbhimani,11/10/2016 

我想計數記錄數和getPreference動作的發生。將會顯示爲一條如下所示的記錄:

day,User,session_duration(min),getPreference_count,total_session_actions 
11/10/2016,hbhimani, 180, 2, 11 

我有超過一個會話時發生了我的挑戰。我不知道如何在索引上動態分片。

樣本數據:

Action,Duration,_time,User,day 
login,4,2016-11-10 05:28:30.396,hbhimani,11/10/2016 
getUserPreferences,179,2016-11-10 05:28:30.575,hbhimani,11/10/2016 
getPreference,3,2016-11-10 05:28:55.686,hbhimani,11/10/2016 
getPreference,4,2016-11-10 05:28:55.961,hbhimani,11/10/2016 
constructFromSession,4,2016-11-10 05:28:56.108,hbhimani,11/10/2016 
getUserPreferences,4,2016-11-10 05:28:56.112,hbhimani,11/10/2016 
getUserPreferences,3,2016-11-10 05:28:56.116,hbhimani,11/10/2016 
setBooleanPreference,4,2016-11-10 05:28:56.238,hbhimani,11/10/2016 
setBooleanPreference,4,2016-11-10 05:28:56.513,hbhimani,11/10/2016 
getQuickSearchInitInfo,3,2016-11-10 05:28:58.936,hbhimani,11/10/2016 
getQuickSearchInitInfo2,4,2016-11-10 05:28:59.315,hbhimani,11/10/2016 
login,3,2016-11-10 05:29:29.202,hbhimani,11/10/2016 
getSummary,4042,2016-11-10 05:29:33.246,hbhimani,11/10/2016 
getEnclosures,457,2016-11-10 05:29:34.372,hbhimani,11/10/2016 
getAuditTrail,1061,2016-11-10 05:29:36.034,hbhimani,11/10/2016 
getRelatedDefects,5,2016-11-10 05:29:36.586,hbhimani,11/10/2016 
getServiceRequests,5,2016-11-10 05:29:36.864,hbhimani,11/10/2016 
getForeignBugs,270,2016-11-10 05:29:37.408,hbhimani,11/10/2016 
getEnclosures,455,2016-11-10 05:29:50.087,hbhimani,11/10/2016 
getSummary,5505,2016-11-10 05:32:26.584,hbhimani,11/10/2016 
getEnclosures,459,2016-11-10 05:32:27.940,hbhimani,11/10/2016 
login,997,2016-11-10 05:32:29.480,anshanno,11/10/2016 
getRelatedDefects,5,2016-11-10 05:32:30.027,anshanno,11/10/2016 
getServiceRequests,5,2016-11-10 05:32:30.306,anshanno,11/10/2016 
getForeignBugs,6,2016-11-10 05:32:30.585,anshanno,11/10/2016 
+0

因爲它的設計沒有工作呀?你的問題到底是什麼?閱讀關於切片。特別是他們如何使用'pandas'數據結構。 –

+0

你的問題看起來像[XY問題問題](http://meta.stackexchange.com/a/66378)。您可以根據您的樣本數據集發佈一個期望的數據集('日,用戶,會話_最低),getPreference_count,total_session_actions'嗎? – MaxU

+0

@ juanpa.arrivillaga我看了看,沒有看到類似於我正在做的事情。 – anshanno

回答

1

IIUC您可以將您的數據如下:

原DF:

In [62]: df 
Out[62]: 
        Action Duration     _time  User  day 
0      login   4 2016-11-10 05:28:30.396 hbhimani 2016-11-10 
1  getUserPreferences  179 2016-11-10 05:28:30.575 hbhimani 2016-11-10 
2    getPreference   3 2016-11-10 05:28:55.686 hbhimani 2016-11-10 
3    getPreference   4 2016-11-10 05:28:55.961 hbhimani 2016-11-10 
4  constructFromSession   4 2016-11-10 05:28:56.108 hbhimani 2016-11-10 
5  getUserPreferences   4 2016-11-10 05:28:56.112 hbhimani 2016-11-10 
6  getUserPreferences   3 2016-11-10 05:28:56.116 hbhimani 2016-11-10 
7  setBooleanPreference   4 2016-11-10 05:28:56.238 hbhimani 2016-11-10 
8  setBooleanPreference   4 2016-11-10 05:28:56.513 hbhimani 2016-11-10 
9 getQuickSearchInitInfo   3 2016-11-10 05:28:58.936 hbhimani 2016-11-10 
10 getQuickSearchInitInfo2   4 2016-11-10 05:28:59.315 hbhimani 2016-11-10 
11     login   3 2016-11-10 05:29:29.202 hbhimani 2016-11-10 
12    getSummary  4042 2016-11-10 05:29:33.246 hbhimani 2016-11-10 
13   getEnclosures  457 2016-11-10 05:29:34.372 hbhimani 2016-11-10 
14   getAuditTrail  1061 2016-11-10 05:29:36.034 hbhimani 2016-11-10 
15  getRelatedDefects   5 2016-11-10 05:29:36.586 hbhimani 2016-11-10 
16  getServiceRequests   5 2016-11-10 05:29:36.864 hbhimani 2016-11-10 
17   getForeignBugs  270 2016-11-10 05:29:37.408 hbhimani 2016-11-10 
18   getEnclosures  455 2016-11-10 05:29:50.087 hbhimani 2016-11-10 
19    getSummary  5505 2016-11-10 05:32:26.584 hbhimani 2016-11-10 
20   getEnclosures  459 2016-11-10 05:32:27.940 hbhimani 2016-11-10 
21     login  997 2016-11-10 05:32:29.480 anshanno 2016-11-10 
22  getRelatedDefects   5 2016-11-10 05:32:30.027 anshanno 2016-11-10 
23  getServiceRequests   5 2016-11-10 05:32:30.306 anshanno 2016-11-10 
24   getForeignBugs   6 2016-11-10 05:32:30.585 anshanno 2016-11-10 

組是:

In [63]: grp = df.groupby(['User', df.Action.eq('login').cumsum()]) 

打印所有組:

In [64]: for g, x in grp: 
    ...:  print(x) 
    ...: 
       Action Duration     _time  User  day 
21    login  997 2016-11-10 05:32:29.480 anshanno 2016-11-10 
22 getRelatedDefects   5 2016-11-10 05:32:30.027 anshanno 2016-11-10 
23 getServiceRequests   5 2016-11-10 05:32:30.306 anshanno 2016-11-10 
24  getForeignBugs   6 2016-11-10 05:32:30.585 anshanno 2016-11-10 
        Action Duration     _time  User  day 
0      login   4 2016-11-10 05:28:30.396 hbhimani 2016-11-10 
1  getUserPreferences  179 2016-11-10 05:28:30.575 hbhimani 2016-11-10 
2    getPreference   3 2016-11-10 05:28:55.686 hbhimani 2016-11-10 
3    getPreference   4 2016-11-10 05:28:55.961 hbhimani 2016-11-10 
4  constructFromSession   4 2016-11-10 05:28:56.108 hbhimani 2016-11-10 
5  getUserPreferences   4 2016-11-10 05:28:56.112 hbhimani 2016-11-10 
6  getUserPreferences   3 2016-11-10 05:28:56.116 hbhimani 2016-11-10 
7  setBooleanPreference   4 2016-11-10 05:28:56.238 hbhimani 2016-11-10 
8  setBooleanPreference   4 2016-11-10 05:28:56.513 hbhimani 2016-11-10 
9 getQuickSearchInitInfo   3 2016-11-10 05:28:58.936 hbhimani 2016-11-10 
10 getQuickSearchInitInfo2   4 2016-11-10 05:28:59.315 hbhimani 2016-11-10 
       Action Duration     _time  User  day 
11    login   3 2016-11-10 05:29:29.202 hbhimani 2016-11-10 
12   getSummary  4042 2016-11-10 05:29:33.246 hbhimani 2016-11-10 
13  getEnclosures  457 2016-11-10 05:29:34.372 hbhimani 2016-11-10 
14  getAuditTrail  1061 2016-11-10 05:29:36.034 hbhimani 2016-11-10 
15 getRelatedDefects   5 2016-11-10 05:29:36.586 hbhimani 2016-11-10 
16 getServiceRequests   5 2016-11-10 05:29:36.864 hbhimani 2016-11-10 
17  getForeignBugs  270 2016-11-10 05:29:37.408 hbhimani 2016-11-10 
18  getEnclosures  455 2016-11-10 05:29:50.087 hbhimani 2016-11-10 
19   getSummary  5505 2016-11-10 05:32:26.584 hbhimani 2016-11-10 
20  getEnclosures  459 2016-11-10 05:32:27.940 hbhimani 2016-11-10 

說明:

In [71]: df['grp_id'] = df.Action.eq('login').cumsum() 

In [72]: df[['Action','User','grp_id']] 
Out[72]: 
        Action  User grp_id 
0      login hbhimani  1 
1  getUserPreferences hbhimani  1 
2    getPreference hbhimani  1 
3    getPreference hbhimani  1 
4  constructFromSession hbhimani  1 
5  getUserPreferences hbhimani  1 
6  getUserPreferences hbhimani  1 
7  setBooleanPreference hbhimani  1 
8  setBooleanPreference hbhimani  1 
9 getQuickSearchInitInfo hbhimani  1 
10 getQuickSearchInitInfo2 hbhimani  1 
11     login hbhimani  2 
12    getSummary hbhimani  2 
13   getEnclosures hbhimani  2 
14   getAuditTrail hbhimani  2 
15  getRelatedDefects hbhimani  2 
16  getServiceRequests hbhimani  2 
17   getForeignBugs hbhimani  2 
18   getEnclosures hbhimani  2 
19    getSummary hbhimani  2 
20   getEnclosures hbhimani  2 
21     login anshanno  3 
22  getRelatedDefects anshanno  3 
23  getServiceRequests anshanno  3 
24   getForeignBugs anshanno  3 
+0

謝謝!我有一個快速跟進問題,可能或可能不是最好問這裏....在for循環中,我執行我的計算,然後將每個df附加到列表並將它們連接回單個df。這是否可擴展? – anshanno

+1

@anshanno,好吧它取決於...你使用熊貓時要檢查的第一件事是你是否可以避免循環 - 如果你的任務可以解決而不使用循環(矢量化方法) - 這是最好的選擇。通常在列表中收集DF並使用'pd.concat(list_of_DFs)'連接它們 - 是一種標準和慣用模式,除非它不適合你的RAM – MaxU

相關問題