使用由兩個因素分組的合併函數將長整型數據幀重整爲寬數據框

-2

我想通過兩個分組變量（resp & company）和三個數字響應變量（質量，數量，意義）將寬數據幀整形爲寬數據框。我試圖用dcast函數來完成它，但它不允許我通過兩個變量進行分組。誰能幫我嗎？使用由兩個因素分組的合併函數將長整型數據幀重整爲寬數據框

#Current long dataframe: two grouping variables (resp & company), three numerical respons variables (Quality, Amount, Sense) 
resp <- c(1325851107,1325851108,1325851109,1325851107,1325851108,1325851109,1325851107,1325851108,1325851109,1325851107,1325851108,1325851109) 
company <- c("Dark.nl","Dark.nl","Dark.nl","Dark.nl","Dark.nl","Dark.nl","Manual.nl","Manual.nl","Manual.nl","Dark.nl","Dark.nl","Dark.nl") 
question <- c("Quality","Quality","Quality","Amount","Amount","Amount","Quality","Quality","Quality","Sense","Sense","Sense") 
score <- c(4,1,2,6,8,10,5,5,7,4,6,7) 
current <- data.frame(resp,company,question,score,answer); current 

#Desired wide dataframe 
resp2 <- c(1325851107,1325851107,1325851108,1325851108,1325851109,1325851109) 
company2 <- c("Dark.nl","Manual.nl","Dark.nl","Manual.nl","Dark.nl","Manual.nl") 
Quality <- c(4,5,1,5,2,7) 
Amount <- c(6,NA,8,NA,10,NA) 
Sense <- c(4,NA,6,NA,7,NA) 
desired <- data.frame(resp2,company2,Quality,Amount,Sense); desired 

#Using dcast function to reshape 
library("reshape2") 
dcast(current, resp + company ~ question, value.var="score")

Parfait提供的合併函數有效。我在這裏提供了製作技巧的腳本（謝謝Parfait;））。

cols2keep <- c("resp", "company", "score") 
df <- merge(current[current$question=='Quality', cols2keep], #merge two dataframes 
     current[current$question=='Amount', cols2keep], 
     by=c("resp", "company"), all=TRUE) 

df <- merge(df,current[current$question=='Sense',  c("resp","company","score")], #merge third respons variable into new dataframe 
     by=c("resp", "company"), all=TRUE) 
colnames(df) <- c("resp","company","quality","amount","sense")

該解決方案有效，但我的真實數據集存在53個響應變量。因此這種方法非常耗時。我嘗試了Parfait的迭代方法，但是我得到以下錯誤。

dfList <- lapply(unique(current$question), function(i){ 
temp <- setNames(current[current$question==i, c("resp", "company", "score")], 
       c("resp", "company", paste0(i))) 
}) 

finaldf <- Reduce(function(...) merge(..., y=c("resp", "company"), all=T), dfList) 
Error in fix.by(by.x, x) : 
'by' must specify one or more columns as numbers, names or logical

我對R編碼比較陌生，無法掌握我寫的錯誤。我對現在的解決方案感到滿意，但如果有更高效的解決方案，我願意接受。

來源

2016-10-03 SHW

考慮在過濾的子集的合併：

cols2keep <- c("resp", "company", "score", "answer") 

df <- merge(current[current$question=='Quality', cols2keep], 
      current[current$question=='Amount', cols2keep], 
      by=c("resp", "company"), all=TRUE) 

colnames(df) <- c("resp", "company", "quality", "quality_a", "amount", "amount_a")  
df 

#   resp company quality  quality_a amount amount_a 
# 1 1325851107 Dark.nl  4 Didn't like  6 Maybe 
# 2 1325851107 Manual.nl  5   Fine  NA  <NA> 
# 3 1325851108 Dark.nl  1  Was ok  8  Fine 
# 4 1325851108 Manual.nl  5 No, thank you  NA  <NA> 
# 5 1325851109 Dark.nl  2   Sure  10 Not bad 
# 6 1325851109 Manual.nl  7  Why not  NA  <NA>

對於多個羣體，如感，繼續進行過濾集合並：

df <- merge(df, 
      current[current$question=='Sense',c("resp", "company", "score", "answer")], 
      by=c("resp", "company"), all=TRUE) 

colnames(df) <- c("resp", "company", "quality", "quality_a", "amount", "amount_a", 
        "sense", "sense_a") 
df 
#   resp company quality  quality_a amount amount_a sense sense_a 
# 1 1325851107 Dark.nl  4 Didn't like  6 Maybe  4 Nice 
# 2 1325851107 Manual.nl  5   Fine  NA  <NA> NA <NA> 
# 3 1325851108 Dark.nl  1  Was ok  8  Fine  6  Ok 
# 4 1325851108 Manual.nl  5 No, thank you  NA  <NA> NA <NA> 
# 5 1325851109 Dark.nl  2   Sure  10 Not bad  7  Yuk 
# 6 1325851109 Manual.nl  7  Why not  NA  <NA> NA <NA>

此外，對於跨問題各級迭代合併，考慮以下因素：

dfList <- lapply(unique(current$question), function(i){ 
    temp <- setNames(current[current$question==i, c("resp", "company", "score", "answer")], 
       c("resp", "company", paste0(i), paste0(i, "_a"))) 
}) 

finaldf <- Reduce(function(...) merge(..., y=c("resp", "company"), all=T), dfList) 
finaldf 
#   resp company Quality  Quality_a Amount Amount_a Sense Sense_a 
# 1 1325851107 Dark.nl  4 Didn't like  6 Maybe  4 Nice 
# 2 1325851107 Manual.nl  5   Fine  NA  <NA> NA <NA> 
# 3 1325851108 Dark.nl  1  Was ok  8  Fine  6  Ok 
# 4 1325851108 Manual.nl  5 No, thank you  NA  <NA> NA <NA> 
# 5 1325851109 Dark.nl  2   Sure  10 Not bad  7  Yuk 
# 6 1325851109 Manual.nl  7  Why not  NA  <NA> NA <NA>

來源

2016-10-03 20:17:09 Parfait

非常感謝你Parfait。這個腳本很容易使用，併產生我想到的數據框。 – SHW

好聽！樂意效勞。請接受以確認解決方案。快樂的編碼！ – Parfait

現在我遇到一些困難時，我的一個分組變量（公司）由兩個以上的級別組成（請參閱我已添加到原始帖子中的附加代碼：#Grouping變量超過兩個級別，包括「Senses」）。我得到這個錯誤：fix.by（by.x，x）中的錯誤：'by'必須指定一個或多個列作爲數字，名稱或邏輯。任何想法這裏出了什麼問題？ – SHW

使用tidyr，繼任的選項reshape2：

library(tidyverse) 

current %>% group_by(resp, company) %>% 
    # join answer and score into a single column to be spread to wide form 
    unite(answer_score, answer, score) %>% 
    spread(question, answer_score) %>% 
    # separate joined columns 
    separate(Amount, c('amount', 'amount_a'), sep = '_', convert = TRUE) %>% 
    separate(Quality, into = c('quality', 'quality_a'), sep = '_', convert = TRUE) 

## Source: local data frame [6 x 6] 
## Groups: resp, company [6] 
## 
##   resp company amount amount_a  quality quality_a 
## *  <dbl> <fctr> <chr> <int>   <chr>  <int> 
## 1 1325851107 Dark.nl Maybe  6 Didn't like   4 
## 2 1325851107 Manual.nl <NA>  NA   Fine   5 
## 3 1325851108 Dark.nl Fine  8  Was ok   1 
## 4 1325851108 Manual.nl <NA>  NA No, thank you   5 
## 5 1325851109 Dark.nl Not bad  10   Sure   2 
## 6 1325851109 Manual.nl <NA>  NA  Why not   7

而不是使用unite你可以使用nest，但spread荷蘭國際集團名單列目前製造NULL！而非NA s，這需要一點點額外的角力：

current %>% group_by(resp, company, question) %>% 
    nest() %>% 
    spread(question, data) %>% 
    # insert NAs with purrr::`%||%` so Amount will spread nicely 
    mutate(Amount = map(Amount, ~.x %||% data_frame(score = NA, answer = NA))) %>% 
    unnest(.sep = '_') 

## # A tibble: 6 × 6 
##   resp company Amount_score Amount_answer Quality_score Quality_answer 
##  <dbl> <fctr>  <dbl>  <fctr>   <dbl>   <fctr> 
## 1 1325851107 Dark.nl   6   Maybe    4 Didn't like 
## 2 1325851107 Manual.nl   NA   NA    5   Fine 
## 3 1325851108 Dark.nl   8   Fine    1   Was ok 
## 4 1325851108 Manual.nl   NA   NA    5 No, thank you 
## 5 1325851109 Dark.nl   10  Not bad    2   Sure 
## 6 1325851109 Manual.nl   NA   NA    7  Why not

來源

2016-10-03 19:53:15 alistaire

感謝您的回答，alistaire，第一個選項已經做到了！ – SHW

只是詢問，我想知道爲什麼你使用這個％>％符號。這個腳本可以工作，但我不確定爲什麼:) – SHW

'％>％'是magrittr包中的_pipe_，它現在被很多包（包括tidyr和dplyr）使用，特別是那些與_tidyverse_，其中管道是一個主要的規則。其基本思想是通過避免嵌套函數調用，大量中間變量或寫入同一變量，並按執行的順序讀取，使代碼更容易閱讀。 [這是一個更好的解釋。]（http://r4ds.had.co.nz/pipes.html） – alistaire

使用由兩個因素分組的合併函數將長整型數據幀重整爲寬數據框

回答

相關問題