最快的方法是使用table
獲得數:
table(df$user)
示例代碼:
> df <- data.frame(user=c(rep("john",4),rep("jane",3)), event=c(rep("failed",3), "success", rep("failed",2), "success"))
> df
user event
1 john failed
2 john failed
3 john failed
4 john success
5 jane failed
6 jane failed
7 jane success
> table(df$user)
jane john
3 4
編輯:爲了解決您做出大幅度修改的問題最近編輯:
> df <- data.frame(user=c(rep("john",4),rep("jane",3)), event=c(rep("failed",3), "success", rep("failed",2), "success"), randNum=c(4,6,1,2,9,3,5))
> library(dplyr)
> df <- df %>% group_by(user) %>% mutate(trial = 1:n())
> df[df$trial==1 | df$event=="success",]
Source: local data frame [4 x 4]
Groups: user [2]
user event randNum trial
<fctr> <fctr> <dbl> <int>
1 john failed 4 1
2 john success 2 4
3 jane failed 9 1
4 jane success 5 3
的成功,如果每個用戶最終成功,你可以只計算行數每個用戶。如果使用library(data.table)並使用fread讀取csv(例如dt),則語法爲dt [,.N,by = user]。 –
好的,但是我怎麼會循環遍歷表格來檢查新用戶何時出現? – jim
我不確定你需要循環訪問CSV文件。獲得所有用戶的列表以及失敗次數是否可以接受? –