我正在運行ddply
函數並不斷收到錯誤。 data.frame的DDPLY分組錯誤
結構:
str(visits.by.user)
'data.frame': 80317 obs. of 5 variables:
$ ClientID : Factor w/ 147792 levels "50912733","50098716",..: 1 3 4 5 6 7 8 10 11 12 ...
$ TotalVisits : int 64 231 18 21 416 290 3 13 1 7 ...
$ TotalDayVisits: int 8 141 0 4 240 155 0 0 0 0 ...
$ TotalNightVisits: int 56 90 18 17 176 135 3 13 1 7 ...
$ quintile : Factor w/ 5 levels "0-20","20-40",..: 5 5 4 4 5 5 2 4 1 3 ...
附註:我知道如何爲隨機數值數據創建示例數據 - 你如何運用5級水平的因素,建立具有代表性的樣本?
ddply代碼:
summary.users <- ddply(data = subset(visits.by.user, TotalVisits > 0),
.(quintile, TotalDayVisits, TotalNightVisits),
summarize,
NumClients = length(ClientID))
錯誤消息:
Error in if (empty(.data)) return(.data) :
missing value where TRUE/FALSE needed
我想,也許ddply
需要我試圖小組是一個因素的變量,所以我在整數變量上嘗試了一個as.factor
,但那不起作用。
任何人都可以看到我要去哪裏錯了嗎?
編輯:添加dput
structure(list(ClientID = structure(c(1L, 2L, 3L, 4L, 5L, 6L), .Label = c("50912733", "60098716", "50087112", "94752212", "78217771", "12884545"), class = "factor"),TotalVisits = c(80L, 92L, 103L, 18L, 182L, 136L), TotalDayVisits = c(56L, 90L, 18L, 17L, 176L, 135L), TotalNightVisits = c(24L, 2L, 85L, 1L, 6L, 1L), quintile = structure(c(5L, 5L, 4L, 4L, 5L, 5L), .Label = c("0-20", "20-40", "40-60", "60-80", "80-100"), class = "factor")), .Names = c("ClientID", "TotalVisits", "TotalDayVisits", "TotalNightVisits", "quintile"), row.names = c(NA,6L), class = "data.frame")
你能與dput的'結果(頭(visits.by.user))'更新您的問題? – Maiasaura 2012-08-01 21:25:49
您正試圖返回每個子集中的行數。要做到這一點,你的代碼應該是'NumClients = nrow'。這可能會解決您的問題。 – Andrie 2012-08-01 21:31:58
@Andrie對此沒有任何好運,但這正是我想要得到的。 – mikebmassey 2012-08-01 22:05:29