即使這是一個重複的,我沒有看到下面的答案,所以......與原始數據開始:
df <- data.frame(A = c("reg","val1","val2","reg","val1","val2","val3","reg","val1","reg","val3","reg","val2","val4"),
B = c(12345, 1, 0, 45678, 0, 0, 1, 97654, 1, 567834, 1, 567845, 0, 1))
我用tidyverse
動詞,並一招添加標籤(以dummy
)到每個"reg"
組使用cumsum
:
install.packages("tidyverse")
library(tidyverse)
df1 <- df %>%
mutate(dummy = cumsum(A=="reg")) %>%
group_by(dummy) %>%
nest() %>%
mutate(data = map(data, ~spread(.x, A, B))) %>%
unnest() %>%
select(-dummy)
這導致:
reg val1 val2 val3 val4
1 12345 1 0 NA NA
2 45678 0 0 1 NA
3 97654 1 NA NA NA
4 567834 NA NA 1 NA
5 567845 NA 0 NA 1
我寧願保持NAs
,但如果你不這樣做:
df1[is.na(df1)] <- 0
reg val1 val2 val3 val4
1 12345 1 0 0 0
2 45678 0 0 1 0
3 97654 1 0 0 0
4 567834 0 0 1 0
5 567845 0 0 0 1
你必須從一個長格式轉換數據幀到寬幅的選項。多種方法使用tidyr以及數據表可以在這裏找到https://stackoverflow.com/questions/30592094/r-spreading-multiple-columns-with-tidyr – Niko
可能重複的[如何傳播重複測量多個變量變成寬格式?](https://stackoverflow.com/questions/29775461/how-can-i-spread-repeated-measures-of-multiple-variables-into-wide-format) – Niko
也許[this](https: //stackoverflow.com/a/44796994/2204410)可以是一個靈感。 – Jaap