用相應的組填充所有行的平均值（ddply？）

可能是一個關於ddply的簡單任務的愚蠢問題，但奇怪的是我找不到解決方案。所以，讓我們說我有一個數據幀，國家內部含有的受訪者，以及一些工作被申請人在已經舉辦了他或她的職業生涯：用相應的組填充所有行的平均值（ddply？）

mydata <- structure(list(country = structure(c(11L, 6L, 7L, 12L, 12L, 3L, 
7L, 10L, 6L, 4L, 5L, 12L, 3L, 1L, 4L, 13L, 2L, 4L, 7L, 3L), contrasts = structure(c(1, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 1, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, -1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 
0, 0, 0, -1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 
0, 0, 0, 1, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 
0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, -1, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 1, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 
0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1), .Dim = c(13L, 
12L), .Dimnames = list(c("Austria", "Germany", "Sweden", "Netherlands", 
"Spain", "Italy", "France", "Denmark", "Greece", "Switzerland", 
"Belgium", "Czechia", "Poland"), c("AT", "DE", "SE", "NL", "ES", 
"IT", "FR", "DK", "GR", "CH", "BE", "CZ"))), .Label = c("Austria", 
"Germany", "Sweden", "Netherlands", "Spain", "Italy", "France", 
"Denmark", "Greece", "Switzerland", "Belgium", "Czechia", "Poland" 
), class = "factor"), njobs = c(2, 2, 3, 2, 1, 2, 4, 2, 1, 3, 
2, 3, 3, 2, 8, 3, 1, 2, 9, 3)), .Names = c("country", "njobs" 
), class = "data.frame", row.names = c(NA, -20L))

我想補充的第三列變量，包含平均職業在該特定國家的職位數。這是很容易在兩行做：

ctry.means <- ddply(mydata,.(country),summarize,avejobs=mean(njobs)) 
result <- merge(mydata,ctry.means,by="country")

然而，這是這樣一個簡單的和經常使用的操作，我覺得必須有做一步到位，一些技巧與ddply簡單的方法。在更一般的情況下，這涉及在單個summarize或mutate語句中組合組級和個案級變量。

來源

2014-02-05 Maxim.K

只需使用'transform/mutate'而不是'summarize'。 – Ramnath

我知道我是愚蠢的:)謝謝@Ramnath –

或與dplyr：'mydata％。％group_by（country）％。％mutate（avejobs = mean（njobs））' – hadley

如果你滿意的一個簡單的鹼溶液，

mydata$new = ave(mydata$njobs, mydata$country)

會做到這一點。

來源

2014-03-15 12:31:03 sparrow

用相應的組填充所有行的平均值（ddply？）

回答

相關問題