查找每個分組變量的最大值並將其轉換爲新變量

我有以下數據集，我想確定每個customer_ID具有最高金額的產品並將其轉換爲新列。我也想每個ID只保留一條記錄。查找每個分組變量的最大值並將其轉換爲新變量

數據生成數據集：

x <- data.frame(customer_id=c(1,1,1,2,2,2), product=c("a","b","c","a","b","c"), amount=c(50,125,100,75,110,150))

實際數據集是這樣的：

customer_id product amount 1 a 50 1 b 125 1 c 100 2 a 75 2 b 110 2 c 150

所需的輸出想應該是這樣的：

customer_ID product_b product_c 1 125 0 2 0 150

來源

2017-04-16 stuski

我們可以用tidyverse來做到這一點。由「CUSTOMER_ID」，slice具有最大「量」，paste前綴（「product_」）的行分組到「產品」柱（如果需要）並spread到寬幅後

library(dplyr) 
library(tidyr) 
x %>% 
    group_by(customer_id) %>% 
    slice(which.max(amount)) %>% 
    mutate(product = paste0("product_", product)) %>% 
    spread(product, amount, fill = 0) 
# customer_id product_b product_c 
#*  <dbl>  <dbl>  <dbl> 
#1   1  125   0 
#2   2   0  150

另一種選擇是arrange數據集由「CUSTOMER_ID」以降序「量」，得到基於「CUSTOMER_ID」和'蔓延到「寬」的distinct行

arrange(x, customer_id, desc(amount)) %>% 
     distinct(customer_id, .keep_all = TRUE) %>% 
     spread(customer_id, amount, fill = 0)

來源

2017-04-16 19:16:57 akrun

使用reshape2包，

library(reshape2) 

x1 <- x[!!with(x, ave(amount, customer_id, FUN = function(i) i == max(i))),] 

dcast(x1, customer_id ~ product, value.var = 'amount', fill = 0) 
# customer_id b c 
#1   1 125 0 
#2   2 0 150

來源

2017-04-16 19:56:42 Sotos

查找每個分組變量的最大值並將其轉換爲新變量

回答

相關問題