2012-07-05 45 views
1

彙總函數給我意味着按月銷售工作正常。相當於使用投票功能彙總語句

library(chron) 
set.seed(42) 
dat <- data.frame(sales = rnorm(1000, mean = 1000, sd = 40), 
       dates = rep(as.Date(seq(from = 14610, to = 14859), 
           origin = "1970-01-01"),4)) 
aggregate(sales~months(as.chron(dates)), mean, data=dat) 

...併產生以下的輸出:

months(as.chron(dates))  sales 
1      Jan 1000.0723 
2      Feb 999.1580 
3      Mar 995.3055 
4      Apr 1000.4912 
5      May 1003.9703 
6      Jun 997.1086 
7      Jul 996.5939 
8      Aug 998.5012 
9      Sep 1001.3709 

我的理解是,下面的投語句應該產生相同的輸出:

cast(dat, months(as.chron(dates)) ~ ., mean, value="sales") 

而是返回以下錯誤:

Error: Casting formula contains variables not found in molten data: months(as.chron(dates)) 

我很可能會丟失一些東西,但是可以在cast語句中使用chron months()調用嗎?以下兩個語句將在cast()中完成相同的操作,但我試圖一步完成並更好地理解如何投射。

dat$mont <- months(as.chron(dat$dates)) 
cast(dat, mont ~ ., mean, value="sales") 

由於提前, --JT

+1

你是正確的'reshape'公式的參數只能是變量名稱不是變量的函數。請參閱'cast_parse_formula'和'check_formula' – Seth 2012-07-05 21:53:15

回答

3

這將reshape2

library(reshape2) 
dcast(dat, months(as.chron(dates)) ~ ., mean, value.var="sales") 
## months(as.chron(dates))  NA 
## 1      Jan 1004.5404 
## 2      Feb 1002.3146 
## 3      Mar 996.0883 
## 4      Apr 994.1707 
## 5      May 1000.4652 
## 6      Jun 1002.8020 
## 7      Jul 996.0357 
## 8      Aug 1001.6754 
## 9      Sep 997.6772 

工作,或者你可以使用plyr

library(plyr) 
ddply(dat, .(months = months(as.chron(dates))), summarize, sales = mean(sales)) 
## months  sales 
## 1 Jan 1004.5404 
## 2 Feb 1002.3146 
## 3 Mar 996.0883 
## 4 Apr 994.1707 
## 5 May 1000.4652 
## 6 Jun 1002.8020 
## 7 Jul 996.0357 
## 8 Aug 1001.6754 
## 9 Sep 997.6772 

或data.table

library(data.table) 
DT <- data.table(dat) 
DT[, month := months(as.chron(dates))][,list(sales = mean(sales)),by = month] 
## month  sales 
## 1: Jan 1004.5404 
## 2: Feb 1002.3146 
## 3: Mar 996.0883 
## 4: Apr 994.1707 
## 5: May 1000.4652 
## 6: Jun 1002.8020 
## 7: Jul 996.0357 
## 8: Aug 1001.6754 
## 9: Sep 997.6772 

從馬修Dowle評論

:=是不需要的,IIUC,爲by接受表達式直接:

DT[, list(sales=mean(sales)), by=months(as.chron(dates))] 
## months  sales 
## 1: Jan 1004.5404 
## 2: Feb 1002.3146 
## 3: Mar 996.0883 
## 4: Apr 994.1707 
## 5: May 1000.4652 
## 6: Jun 1002.8020 
## 7: Jul 996.0357 
## 8: Aug 1001.6754 
## 9: Sep 997.6772 
+0

謝謝大家...看起來我需要更好地瞭解reshape2。 – JimmyT 2012-07-06 17:42:51