摺疊一個data.frame到data.frame - 與問題（）和骨料（）

想想看，我有以下的數據，我很喜歡摺疊一個data.frame到data.frame - 與問題（）和骨料（）

landlines <- data.frame(
       year=rep(c(1990,1995,2000,2005,2010),times=3), 
       country=rep(c("US", "Brazil", "Asia"), each=5), 
       pct = c(0.99, 0.99, 0.98, 0.05, 0.9, 
         0.4, 0.5, 0.55, 0.5, 0.45, 
         0.7, 0.85, 0.9, 0.85, 0.75) 
       ) 
someStats <- function(x) 
{ 
    dp <- as.matrix(x$pct)-mean(x$pct) 
    indp <- as.matrix(x$year)-mean(x$year) 
    f <- lm.fit(indp,dp)$coefficients 
    w <- sd(x$pct) 
    m <- min(x$pct) 
    results <- c(f,w,m) 
    names(results) <- c("coef","sdev", "minPct") 
    results 
}

我可以申請該函數返回彙總統計作用到的數據子集成功是這樣的：

> someStats(landlines[landlines$country=="US",]) 
     coef  sdev minPct 
-0.022400 0.410938 0.050000

或看崩潰的國家是這樣的：

> by(landlines, list(country=landlines$country), someStats) 
country: Asia 
     coef  sdev  minPct 
0.00200000 0.08215838 0.70000000 
--------------------------------------------------------------------------------------- 
country: Brazil 
     coef  sdev  minPct 
0.00200000 0.05700877 0.40000000 
--------------------------------------------------------------------------------------- 
country: US 
    coef  sdev miPct 
-0.022400 0.410938 0.050000

麻煩的是，這不是data.frame對象，我需要進行進一步的處理，也不會投這樣：「沒問題」

> as.data.frame(by(landlines, list(country=landlines$country), someStats)) 
Error in as.data.frame.default(by(landlines, list(country = landlines$country), : 
    cannot coerce class '"by"' into a data.frame

我想，既然類似aggregate()功能確實返回data.frame：

> aggregate(landlines$pct, by=list(country=landlines$country), min) 
    country x 
1 Asia 0.70 
2 Brazil 0.40 
3  US 0.05

麻煩的是，它不與任意函數正常工作：

> aggregate(landlines, by=list(country=landlines$country), someStats) 
Error in x$pct : $ operator is invalid for atomic vectors

我真正想要得到的是一個data.frame對象與下列：

國家
COEF
發展局局長
minPct

我怎麼能這樣做？

來源

2012-04-04 Brian B

看一看的plyr包，特別是ddply

> ddply(landlines, .(country), someStats) 
    country coef  sdev minPct 
1 Asia 0.0020 0.08215838 0.70 
2 Brazil 0.0020 0.05700877 0.40 
3  US -0.0224 0.41093795 0.05

理想的情況下你的函數明確返回data.frame但在這種情況下，它可以很容易地和正確地強制轉換爲一個。

來源

2012-04-04 14:58:53 Justin

這很好，謝謝！ – 2012-04-04 15:19:18

aggregate是爲不同的目的而設計的。你想要的是lapply(split())：

> lapply(split(landlines, list(country=landlines$country)), FUN=someStats) 
$Asia 
     coef  sdev  minPct 
0.00200000 0.08215838 0.70000000 

$Brazil 
     coef  sdev  minPct 
0.00200000 0.05700877 0.40000000 

$US 
    coef  sdev minPct 
-0.022400 0.410938 0.050000

在情況下，輸出將是可預見的規則，可能是更好的sapply使用：

> sapply(split(landlines, list(country=landlines$country)), FUN=someStats) 
      Asia  Brazil  US 
coef 0.00200000 0.00200000 -0.022400 
sdev 0.08215838 0.05700877 0.410938 
minPct 0.70000000 0.40000000 0.050000

在rownames構建第一列具有值

新增示範：

> ttbl <- as.data.frame(t(tbl)) 
> ttbl <- cbind(Country=rownames(ttbl), ttbl) 
> ttbl 
     Country coef  sdev minPct 
Asia  Asia 0.0020 0.08215838 0.70 
Brazil Brazil 0.0020 0.05700877 0.40 
US   US -0.0224 0.41093795 0.05

來源

2012-04-04 15:11:44

這些並沒有最終給我在我的實際應用程序中進一步後處理所需的'data.frame'。應用'as.data.frame（t（sapply（）））'接近了，但當然缺少國家專欄。 – 2012-04-04 15:22:35

添加方法來做到這一點。 – 2012-04-04 16:17:40

by對象實際上是列表，因此您可以使用rbind在do.call：

do.call("rbind",by(landlines, list(country=landlines$country), someStats)) 
      coef  sdev minPct 
Asia 0.0020 0.08215838 0.70 
Brazil 0.0020 0.05700877 0.40 
US  -0.0224 0.41093795 0.05

來源

2012-04-04 15:27:53 James

摺疊一個data.frame到data.frame - 與問題（）和骨料（）

回答

相關問題