DDPLY分組錯誤

我正在運行ddply函數並不斷收到錯誤。 data.frame的DDPLY分組錯誤

結構：

str(visits.by.user) 
'data.frame': 80317 obs. of 5 variables: 
$ ClientID : Factor w/ 147792 levels "50912733","50098716",..: 1 3 4 5 6 7 8 10 11 12 ... 
$ TotalVisits  : int 64 231 18 21 416 290 3 13 1 7 ... 
$ TotalDayVisits: int 8 141 0 4 240 155 0 0 0 0 ... 
$ TotalNightVisits: int 56 90 18 17 176 135 3 13 1 7 ... 
$ quintile   : Factor w/ 5 levels "0-20","20-40",..: 5 5 4 4 5 5 2 4 1 3 ...

附註：我知道如何爲隨機數值數據創建示例數據 - 你如何運用5級水平的因素，建立具有代表性的樣本？

ddply代碼：

summary.users <- ddply(data = subset(visits.by.user, TotalVisits > 0), 
          .(quintile, TotalDayVisits, TotalNightVisits), 
          summarize, 
          NumClients = length(ClientID))

錯誤消息：

Error in if (empty(.data)) return(.data) : 
missing value where TRUE/FALSE needed

我想，也許ddply需要我試圖小組是一個因素的變量，所以我在整數變量上嘗試了一個as.factor，但那不起作用。

任何人都可以看到我要去哪裏錯了嗎？

編輯：添加dput

structure(list(ClientID = structure(c(1L, 2L, 3L, 4L, 5L, 6L), .Label = c("50912733", "60098716", "50087112", "94752212", "78217771", "12884545"), class = "factor"),TotalVisits = c(80L, 92L, 103L, 18L, 182L, 136L), TotalDayVisits = c(56L, 90L, 18L, 17L, 176L, 135L), TotalNightVisits = c(24L, 2L, 85L, 1L, 6L, 1L), quintile = structure(c(5L, 5L, 4L, 4L, 5L, 5L), .Label = c("0-20", "20-40", "40-60", "60-80", "80-100"), class = "factor")), .Names = c("ClientID", "TotalVisits", "TotalDayVisits", "TotalNightVisits", "quintile"), row.names = c(NA,6L), class = "data.frame")

來源

2012-08-01 mikebmassey

你能與dput的'結果（頭（visits.by.user））'更新您的問題？ – Maiasaura 2012-08-01 21:25:49

您正試圖返回每個子集中的行數。要做到這一點，你的代碼應該是'NumClients = nrow'。這可能會解決您的問題。 – Andrie 2012-08-01 21:31:58

@Andrie對此沒有任何好運，但這正是我想要得到的。 – mikebmassey 2012-08-01 22:05:29

你的第一個參數的頂部被命名爲data=而ddply接受名爲.data第一個參數。如果我改變這個，你的代碼運行良好。

關於我的評論，這是一個我以爲我曾經遇到過的問題，但似乎在ddply機制中隱含了一個類似droplevels的調用。我很想聽到更深入的解釋，說明它的工作原理！

dat <- data.frame(x=1:20, z=factor(rep(letters[1:4], each=5))) 

ddply(dat, .(z), summarise, length(x)) 
    z ..1 
1 a 5 
2 b 5 
3 c 5 
4 d 5 
ddply(subset(dat, z!='a'), .(z), summarise, length(x)) 
    z ..1 
1 b 5 
2 c 5 
3 d 5

這表現很好。但是看着因子水平有點出乎我的意料：

ddply(subset(dat, z!='a'), .(z), summarise, paste(levels(z), collapse=' ')) 
    z  ..1 
1 b a b c d 
2 c a b c d 
3 d a b c d

來源

2012-08-01 22:46:46 Justin

有一個參數'.drop'（默認爲'真'）爲'ddply'。這會刪除數據中不存在的組合。如果你運行ddply（子集（dat，z！='a'），。（z），總結，長度（x），。drop = F）'，第一行將是'a，0' – mnel 2012-08-02 00:21:17

我認爲我正在通過添加'data ='徹底，就像你應該用'ggplot'一樣。謝謝您的幫助。 – mikebmassey 2012-08-02 14:07:55

@mikebmassey你是！除了參數不是'data'它的'.data' – Justin 2012-08-02 14:39:50

這工作得很好：

summary.users <- ddply(subset(visits.by.user, TotalVisits > 0), 
          .(quintile, TotalDayVisits, TotalNightVisits), 
          summarize, NumClients = length(ClientID)) 

> summary.users 
    quintile TotalDayVisits TotalNightVisits NumClients 
1 60-80    17    1   1 
2 60-80    18    85   1 
3 80-100    56    24   1 
4 80-100    90    2   1 
5 80-100   135    1   1 
6 80-100   176    6   1

來源

2012-08-02 00:34:20 Maiasaura

回答

相關問題