2017-10-10 153 views
-1

輟學行我有此數據幀dput下面給出:其不會滿足給定條件

lf3 = structure(list(session_id = c(1L, 1L, 1L, 2L, 3L, 5L, 5L, 6L, 
6L, 7L), userId = c(1, 1, 1, 2, 2, 4, 4, 5, 5, 5), datetime = 
structure(c(1457029336, 
1457029337, 1457029340, 1457029596, 1457313569, 1457030783, 1457030784, 
1457030918, 1457030920, 1457370365), class = c("POSIXct", "POSIXt" 
), tzone = "UTC"), referer = c(22, 2, 7, 5, 23, 20, 7, 24, 18, 
22), request = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 5)), .Names = c("session_id", 
"userId", "datetime", "referer", "request"), row.names = c(NA, 
10L), class = "data.frame") 

現在我想退出具有最小指定的標準/值的那些會話。 我試試這個代碼:

lf3 %>% group_by(session_id) %>% tally(sort = TRUE) %>% filter(n>2)

但我想退貨同一數據框,只有會議通過此條件下,象下面這樣:

session_id userId   datetime referer request 
1   1  1 2016-03-03 18:22:16  22  1 
2   1  1 2016-03-03 18:22:17  2  2 
3   1  1 2016-03-03 18:22:20  7  3 

如何去與

+0

用預期輸出更新您的問題。 –

+0

因此,它只會給出頻率大於2的session_id = 1行。所需輸出將如下所示:'structure(list(session_id = c(1L,1L,1L),userId = c(1,1,1 (22,2,7),請求= 0(),datetime =結構(c(1457029336,1457029337,1457029340),class = c(「POSIXct」, 「POSIXt」),tzone =「UTC」 c(1, 2,3)),.Names = c(「session_id」,「userId」,「datetime」,「referer」, 「request」),row.names = c(NA,3L) =「data.frame」)' – SumitArya

+0

我更喜歡base R,'ave','lf3 [ave(lf3 $ userId,lf3 $ session_id,FUN = length)> 2,]' –

回答

4

您可能需要group_by %>% filter

lf3 %>% group_by(session_id) %>% filter(n() > 2) 

# A tibble: 3 x 5 
# Groups: session_id [1] 
# session_id userId   datetime referer request 
#  <int> <dbl>    <dttm> <dbl> <dbl> 
#1   1  1 2016-03-03 18:22:16  22  1 
#2   1  1 2016-03-03 18:22:17  2  2 
#3   1  1 2016-03-03 18:22:20  7  3 
+1

確定這樣可以工作。我將其轉換爲數據框並將其保存爲另一個數據框變量名稱。謝謝使用此方法時應檢查性能。 – SumitArya

0

我們可以使用data.table

library(data.table) 
setDT(lf3)[, if(.N >2) .SD, session_id] 
#  session_id userId   datetime referer request 
#1:   1  1 2016-03-03 18:22:16  22  1 
#2:   1  1 2016-03-03 18:22:17  2  2 
#3:   1  1 2016-03-03 18:22:20  7  3