3
我有一個包含列組的tibble,賬戶和持續時間,每行代表1個事件。我想創建一個很好的彙總表,其中包括組,賬戶,總計持續時間,計算價格以及最終總持續時間的組別比例。用dplyr計算多變量分組時變量的比例
重複的樣品:
library(tidyverse)
library(lubridate)
tidy_data <- structure(list(group = c("Group 1", "Group 2", "Group 3", "Group 1", "Group 2", "Group 3", "Group 4", "Group 4", "Group 2"), account = c("Account 1", "Account 2","Account 3", "Account 1", "Account 2", "Account 3", "Account 4", "Account 4", "Account 2"), duration = structure(c(146.15, 181.416666666667, 96.9, 52.2833333333333, 99.4333333333333, 334.116666666667, 16.6333333333333, 11.5666666666667, 79.5666666666667), units = "mins", class = "difftime")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -9L), .Names = c("group","account", "duration"))
hourPrice = 25
總結1 - 正確地計算的比例,但不包括帳號
tidy_data %>%
group_by(group) %>%
summarise(total = sum(duration) %>% time_length(unit = "hour") %>% round(digits = 2),
price = (total*hourPrice) %>% round(digits = 0)) %>%
mutate(prop = (price/sum(price) * 100) %>% round(digits = 0))
# A tibble: 4 × 4
group total price prop
<chr> <dbl> <dbl> <dbl>
1 Group 1 3.31 83 20
2 Group 2 6.01 150 35
3 Group 3 7.18 180 42
4 Group 4 0.47 12 3
摘要2 - 包括帳號,但無法計算比例正確
tidy_data %>%
group_by(group, account) %>%
summarise(total = sum(duration) %>% time_length(unit = "hour") %>% round(digits = 2),
price = (total*hourPrice) %>% round(digits = 0)) %>%
mutate(prop = (price/sum(price) * 100) %>% round(digits = 0))
#Source: local data frame [4 x 5]
#Groups: group [4]
group account total price prop
<chr> <chr> <dbl> <dbl> <dbl>
1 Group 1 Account 1 3.31 83 100
2 Group 2 Account 2 6.01 150 100
3 Group 3 Account 3 7.18 180 100
4 Group 4 Account 4 0.47 12 100
我意識到問題是,由於這兩個在第二種情況下,總結只能在一個組內進行。我考慮完成摘要1,然後將帳號重新加入表格,但在我看來,必須有更好的解決方案。
編輯:輸出我想:
group account total price prop
<chr> <chr> <dbl> <dbl> <dbl>
1 Group 1 Account 1 3.31 83 20
2 Group 2 Account 2 6.01 150 35
3 Group 3 Account 3 7.18 180 42
4 Group 4 Account 4 0.47 12 3
這就是訣竅! :-)我不知道slice命令,起初對我來說並不直觀,它會選擇每個組的第一行,但我喜歡這個解決方案。 – emiltb