2016-11-21 43 views
-3

我想在R中聚合數據集的中位數。是否可以使用中位數來聚合或彙總R中的數據集?

d <- aggregate(c(d$user_reported_percent, d$machine_percent), 
         by = list(d$day), FUN=median, simplify = TRUE, drop = TRUE) 

但是R一直抱怨,我不確定它是否有意義與中位數聚合。

一些錯誤的R給我:在aggregate.data.frame 錯誤(as.data.frame(X),...): 參數必須具有相同的長度

然後我試圖使用發生變異,至少找位

d <- d %>% group_by(day) %>% mutate(median=median(user_reported_percent)) 

錯誤是: 錯誤:無效標型「整數」

我希望得到任何幫助! 非常感謝!

P.S均值爲一切工作完全正常

我的數據集是這樣的:

structure(list(esmFollValue = c(36.00852, 8.688648, 0.6372048, 
13.7394, 0.7599012, 16.43628, 7.569684, 0.4502016, 0.7630464, 
0.781386, 0.5116056, 0.858756, 18.06108, 0.5473332, 14.62944, 
14.62944, 14.07216, 0.5366868, 14.12892, 0.7354944), esmHappValue = c(100L, 
80L, 80L, 80L, 60L, 80L, 60L, 60L, 80L, 60L, 100L, 60L, 80L, 
60L, 60L, 60L, 60L, 60L, 80L, 60L), deviceId = structure(c(11L, 
11L, 11L, 6L, 3L, 15L, 3L, 3L, 15L, 3L, 15L, 15L, 15L, 15L, 3L, 
3L, 15L, 3L, 9L, 9L), .Label = c("1e6c1183-af64-4860-b2d6-533cab7afe6c", 
"34209e3d-1a82-4f75-95c8-846be8a1be03", "7066f4af-82f3-4369-8f45-70d1ea3d22f2", 
"7cf78328-60c5-4564-9dd0-309cb0b3d5ad", "95b11f22-91e8-46d0-88d9-4f197267aa29", 
"a0c89d2a-d22d-41d0-a070-b9887d911953", "cde8cc10-7212-4a41-ae9b-bbeb51dbe8ed", 
"d150bfa4-0b52-47a0-b450-1eb21aaada53", "d41db7bc-2b81-4111-9b32-a0aab55cb25a", 
"d7e8e8c7-5190-4f0b-aa49-72e520bc9aad", "dd1218a2-4e67-4cbf-bf4d-9e288865aa63", 
"f093abf9-22e1-47e6-ae5d-1238629d8542", "fae0dd29-2b89-4c1d-b5ad-7858abe122ac", 
"feeb0ab0-7d13-4a5c-b0df-58dd85c7f607", "ff883e61-c9a9-4e6b-8b6b-cab3e5535879" 
), class = "factor"), timestamp = c(1457272936.882, 1457337998.931, 
1457424251.996, 1457429767.632, 1457597635.755, 1457683537.604, 
1457861178.161, 1457964712.356, 1458029223.54, 1458046931.652, 
1458051135.219, 1458115293.069, 1458133652.503, 1458202019.302, 
1458203945.674, 1458203945.787, 1458306790.803, 1458308783.441, 
1458460903.755, 1458480932.088), group = structure(c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L), .Label = c("groupA", "groupB", "groupC", "groupD"), class = "factor"), 
    cameraFeed = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Non-visible camera feed", 
    "Visible camera feed"), class = "factor"), timegroup = structure(c(2L, 
    1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 
    2L, 2L, 1L, 2L), .Label = c("Day", "Evening"), class = "factor"), 
    day = structure(c(4L, 2L, 6L, 6L, 5L, 1L, 4L, 2L, 6L, 6L, 
    6L, 7L, 7L, 5L, 5L, 5L, 1L, 1L, 4L, 4L), .Label = c("Friday", 
    "Monday", "Saturday", "Sunday", "Thursday", "Tuesday", "Wednesday" 
    ), class = "factor"), user_reported_percent = c(83.3333333333333, 
    66.6666666666667, 66.6666666666667, 66.6666666666667, 50, 
    66.6666666666667, 50, 50, 66.6666666666667, 50, 83.3333333333333, 
    50, 66.6666666666667, 50, 50, 50, 50, 50, 66.6666666666667, 
    50), machine_percent = c(30.0071, 7.24054, 0.531004, 11.4495, 
    0.633251, 13.6969, 6.30807, 0.375168, 0.635872, 0.651155, 
    0.426338, 0.71563, 15.0509, 0.456111, 12.1912, 12.1912, 11.7268, 
    0.447239, 11.7741, 0.612912)), .Names = c("esmFollValue", 
"esmHappValue", "deviceId", "timestamp", "group", "cameraFeed", 
"timegroup", "day", "user_reported_percent", "machine_percent" 
), row.names = c(NA, 20L), class = "data.frame") 

,我想有每天每%的一種價值。

+3

請顯示可重現的示例和預期輸出。 – akrun

+0

對不起,沒有具體和描述。我現在添加了數據集快照 –

+1

「聚合」行中至少有一個錯誤。更可能的是,第一個參數應該是'd [,c(「user_reported_percent」,「machine_percent」)]'。它不能保證它可以工作,但是你收到的錯誤來自於你的第一個參數的長度比分組變量的長度要長(你明白爲什麼?)。 – nicola

回答

0

與@nicola的幫助下,我用這個:

aggregate(d[,c("user_reported_percent","machine_percent")],b‌​y = list(d$day), FUN=median) 

,一切工作正常。 非常感謝!