收集重複的列集合到單個列中

收集多個列集合的問題已在這裏解決：Gather multiple sets of columns，但在我的情況下，列不是唯一的。收集重複的列集合到單個列中

我有以下數據：

input <- data.frame(
    id = 1:2, 
    question = c("a", "b"), 
    points = 0, 
    max_points = c(3, 5), 
    question = c("c", "d"), 
    points = c(0, 20), 
    max_points = c(5, 20), 
    check.names = F, 
    stringsAsFactors = F 
) 
input 
#> id question points max_points question points max_points 
#> 1 1  a  0   3  c  0   5 
#> 2 2  b  0   5  d  20   20

第一列是一個id，然後我有許多重複列（原始數據集具有133列）：

標識符問題
分數
最大分數

我想用這種結構落得：

expected <- data.frame(
    id = c(1, 2, 1, 2), 
    question = letters[1:4], 
    points = c(0, 0, 0, 20), 
    max_points = c(3, 5, 5, 20), 
    stringsAsFactors = F 
) 
expected 
#> id question points max_points 
#> 1 1  a  0   3 
#> 2 2  b  0   5 
#> 3 1  c  0   5 
#> 4 2  d  20   20

我已經試過幾件事情：

tidyr::gather(input, key, val, -id)
reshape2::melt(input, id.vars = "id")

兩個不能提供所需的輸出。此外，如果列數多於此處顯示的列數，gather不再適用，因爲有太多重複的列。

作爲一種變通方法我試過這樣：

# add numbers to make col headers "unique" 
names(input) <- c("id", paste0(1:(length(names(input)) - 1), names(input)[-1])) 

# gather, remove number, spread 
input %>% 
    gather(key, val, -id) %>% 
    mutate(key = stringr::str_replace_all(key, "[:digit:]", "")) %>% 
    spread(key, val)

它給出了一個錯誤：Duplicate identifiers for rows (3, 9), (4, 10), (1, 7), (2, 8)

已經在這裏討論這個問題：Unexpected behavior with tidyr，但我不知道爲什麼/我應該如何添加其他標識符。這很可能不是主要問題，因爲我可能應該以不同的方式處理整個事情。

我該如何解決我的問題，最好用tidyr或base？我不知道如何使用data.table，但如果有簡單的解決方案，我也會解決這個問題。

來源

2016-06-29 Thomas K

所有你的問題，max_points和點列實際命名爲相同的東西？ –

也許'rbind（input [，c（1,2：4）]，input [，c（1,5：7）]）'？ – zx8754

@MikeyMike是的。 –

試試這個：

do.call(rbind, 
     lapply(seq(2, ncol(input), 3), function(i){ 
      input[, c(1, i:(i + 2))] 
       }) 
     ) 

# id question points max_points 
# 1 1  a  0   3 
# 2 2  b  0   5 
# 3 1  c  0   5 
# 4 2  d  20   20

來源

2016-06-29 12:52:52 zx8754

你想怎麼ID列進行處理，但也許這樣的事情可能需要澄清？

runme <- function(word , dat){ 
    grep(paste0("^" , word , "$") , names(dat)) 
} 

l <- mapply(runme , unique(names(input)) , list(input)) 
l2 <- as.data.frame(l) 

output <- data.frame() 
for (i in 1:nrow(l2)) output <- rbind(output , input[, as.numeric(l2[i,]) ])

不知道它是相對於處理重複列數不同如何強勁，但它適用於您的測試數據，如果你列被重複的次數等於數字應該工作。

來源

2016-06-29 12:53:03 CroGo

另一種方式來完成，而無需使用lapply同一個目標：

首先，我們要抓住所有列的問題，MAX_POINTS，並指出那我們融化每一個單獨和cbind它們放在一起。

library(reshape2) 

questions <- input[,c(1,c(1:length(names(input)))[names(input)=="question"])] 
points <- input[,c(1,c(1:length(names(input)))[names(input)=="points"])] 
max_points <- input[,c(1,c(1:length(names(input)))[names(input)=="max_points"])] 

questions_m <- melt(questions,id.vars=c("id"),value.name = "questions")[,c(1,3)] 
points_m <- melt(points,id.vars=c("id"),value.name = "points")[,3,drop=FALSE] 
max_points_m <- melt(max_points,id.vars=c("id"),value.name = "max_points")[,3, drop=FALSE] 

res <- cbind(questions_m,points_m, max_points_m) 
res 
    id questions points max_points 
1 1   a  0   3 
2 2   b  0   5 
3 1   c  0   5 
4 2   d  20   20

來源

2016-06-29 13:14:33

在數據中這樣做的慣用方式。表是非常簡單的：

library(data.table) 
setDT(input) 

res = melt(
    input, 
    id = "id", 
    meas = patterns("question", "^points$", "max_points"), 
    value.name = c("question", "points", "max_points") 
) 


    id variable question points max_points 
1: 1  1  a  0   3 
2: 2  1  b  0   5 
3: 1  2  c  0   5 
4: 2  2  d  20   20

你會得到一個名爲「變量」額外列，但您可以用res[, variable := NULL]以後如果需要擺脫它。

來源

2016-06-29 14:04:02 Frank

收集重複的列集合到單個列中

回答

相關問題