2017-09-10 82 views
1

我有這樣的數據幀:的R - 使用列表()來聚合數據幀

> head(DF, 10) 
     DATE USER CATEGORY  QTY 
1 2017-09-04  A79 Footwear  2167 
2 2017-08-31  A41 Accessories  342 
3 2017-08-27  A34 Accessories  828 
4 2017-08-22  A68 Accessories 1292 
5 2017-08-23  A68 Accessories 1297 
6 2017-08-23  A68 Footwear  1944 
7 2017-08-25  A68 Accessories  60 
8 2017-08-25  A68 Footwear   5 
9 2017-08-25  A68 Apparel  2454 
10 2017-08-29  A68 Accessories 2521 

我想是這樣的:

> head(DF1, 10) 
     DATE USER        CATEGORIES QTY_SUM 
1 2017-09-04  A79 Footwear          2167 
2 2017-08-31  A41 Accessories         342 
3 2017-08-27  A34 Accessories         828 
4 2017-08-22  A68 Accessories         1292 
5 2017-08-23  A68 Accessories-1297, Footwear-1944    3241 
6 2017-08-25  A68 Accessories-60, Footwear-5, Apparel-2454  2519 
7 2017-08-29  A68 Accessories         2521 

我使用aggregate試過了,不能很好地工作。我認爲這可能類似於這樣的事情:

DF1 <- data.table(DF, key=c('DATE', 'USER_ID')) 
DF1 <- DF1[, list(CATEGORIES=paste0(CATEGORY, "-", QTY), QTY=sum(QTY)), by=c('DATE', 'USER_ID')] 
> head(x, 10) #getting this 
     DATE USER   CATEGORY  QTY 
1 2017-09-04  A79 Footwear-2167  2167 
2 2017-08-31  A41 Accessories-342  342 
3 2017-08-27  A34 Accessories-828  828 
4 2017-08-22  A68 Accessories-1292 1292 
5 2017-08-23  A68 Accessories-1297 1297 
6 2017-08-23  A68 Footwear-1944  1944 
7 2017-08-25  A68 Accessories-60  60 
8 2017-08-25  A68 Footwear-5    5 
9 2017-08-25  A68 Apparel-2454  2454 
10 2017-08-29  A68 Accessories   2521 

我在做什麼錯?請建議是否有更好的方法來做到這一點。

回答

4

使用dplyr,,您可以:

df <- read.table(text=" 
DATE USER CATEGORY  QTY 
1 2017-09-04  A79 Footwear  2167 
2 2017-08-31  A41 Accessories  342 
3 2017-08-27  A34 Accessories  828 
4 2017-08-22  A68 Accessories 1292 
5 2017-08-23  A68 Accessories 1297 
6 2017-08-23  A68 Footwear  1944 
7 2017-08-25  A68 Accessories  60 
8 2017-08-25  A68 Footwear   5 
9 2017-08-25  A68 Apparel  2454 
10 2017-08-29  A68 Accessories 2521") 

library(dplyr) 

我們首先group_by日期和用戶(我猜的),那麼你粘貼一些裝飾類別的每個項目。最後,取消組合您的data.frametibble這裏,但一個data.frame剩下的):

df %>% 
    group_by(DATE, USER) %>% 
    summarise(CATEGORIES=paste(CATEGORY, QTY, sep="-", collapse=","), 
      QTY_SUM=sum(QTY)) %>% 
    ungroup() 

# A tibble: 7 x 4 
DATE USER        CATEGORIES QTY_SUM 
<fctr> <fctr>         <chr> <int> 
    1 2017-08-22 A68      Accessories-1292 1292 
2 2017-08-23 A68   Accessories-1297,Footwear-1944 3241 
3 2017-08-25 A68 Accessories-60,Footwear-5,Apparel-2454 2519 
4 2017-08-27 A34      Accessories-828  828 
5 2017-08-29 A68      Accessories-2521 2521 
6 2017-08-31 A41      Accessories-342  342 
7 2017-09-04 A79       Footwear-2167 2167 

這是你想要的嗎?

+0

謝謝,這是完美的。但是,我發現我在'paste0'函數中錯過了'collapse',這也是工作。 – Arani