2014-04-01 64 views
0

我想每各城市的總和提交下列數據幀列總結了數據幀:根據R中

> summary(dat1) 
     Date     City   Sales   
Min. :2010-06-18 Min. : 1.00 Min. : 667.4 
1st Qu.:2011-02-18 1st Qu.:18.00 1st Qu.: 1138.6 
Median :2011-10-28 Median :37.00 Median : 1507.5 
Mean :2011-10-29 Mean :44.26 Mean : 2065.4 
3rd Qu.:2012-07-06 3rd Qu.:74.00 3rd Qu.: 2347.1 
Max. :2013-03-08 Max. :99.00 Max. :47206.6 

即我想找到相應的日期X市的數據幀觀察結果將顯示每天每個城市的銷售總和。

回答

0

aggregation功能

aggregate(Sales~Date+City, data=dat1, sum) 
+0

是您的代碼是否正確?我無法通過這種方式獲得公式。它不應該是銷售〜日期+城市嗎? –

+0

有錯誤,應該是+而不是cbind。當你想要聚合更多變量(銷售和其他)時,應該使用cbind。現在代碼是正確的。 – Maciej

1

這有幾種可能性。僅舉幾例:

  1. 功能集合體():

    ⅰ)aggregate(Sales~Date+City, data=df, sum)

    ⅱ)aggregate(df$Sales, list(df$Date,df$City), sum)

  2. 功能tapply():

    ⅰ)tapply(df$Sales, list(df$Date, df$City), sum)

功能tapply()是特別有用的,如果你有一個大的數據集,因爲聚合趨於堵塞非常大的數據集,但tapply()通常處理這些更優雅。此外,tapply()aggregate()以不同的格式生成輸出,您可能需要選擇最適合可能的進一步分析的輸出。

這些例子可以在模擬數據進行測試,其介紹如下:

df<-structure(list(Date = structure(c(4L, 2L, 4L, 2L, 3L, 4L, 3L, 
2L, 2L, 2L, 2L, 4L, 1L, 4L, 2L, 4L, 2L, 3L, 4L, 2L, 3L, 3L, 4L, 
3L, 4L, 2L, 2L, 2L, 3L, 1L, 1L, 4L, 2L, 4L, 1L, 2L, 1L, 2L, 3L, 
2L, 2L, 3L, 2L, 1L, 1L, 3L, 2L, 1L, 1L, 3L, 3L, 1L, 3L, 1L, 1L, 
1L, 3L, 2L, 3L, 1L, 3L, 3L, 2L, 2L, 4L, 2L, 1L, 3L, 3L, 1L, 4L, 
1L, 2L, 2L, 1L, 2L, 2L, 2L), .Label = c("2014-01-01", "2014-02-01", 
"2014-03-01", "2014-04-01"), class = "factor"), City = structure(c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 
16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 1L, 2L, 
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 
17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 1L, 2L, 3L, 
4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 
18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L), .Label = c("a", 
"b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", 
"o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"), class = "factor"), 
    Sales = c(100, 100, 93, 92, 95, 115, 104, 106, 113, 94, 93, 
    98, 116, 85, 98, 97, 103, 110, 105, 104, 107, 86, 92, 94, 
    106, 115, 112, 92, 103, 100, 101, 97, 95, 110, 103, 92, 91, 
    98, 100, 93, 108, 87, 96, 101, 87, 111, 90, 94, 110, 95, 
    110, 101, 88, 99, 106, 117, 101, 120, 92, 86, 118, 104, 99, 
    89, 103, 102, 121, 99, 106, 99, 107, 105, 109, 110, 112, 
    94, 100, 112)), .Names = c("Date", "City", "Sales"), row.names = c(NA, 
    -78L), class = "data.frame") 
+0

您需要更改代碼,因爲變量名稱不正確。 – Maciej

+0

謝謝,我確定了變量名的大小寫。 –