2016-10-03 48 views
-2

我想通過兩個分組變量(resp & company)和三個數字響應變量(質量,數量,意義)將寬數據幀整形爲寬數據框。我試圖用dcast函數來完成它,但它不允許我通過兩個變量進行分組。誰能幫我嗎?使用由兩個因素分組的合併函數將長整型數據幀重整爲寬數據框

#Current long dataframe: two grouping variables (resp & company), three numerical respons variables (Quality, Amount, Sense) 
resp <- c(1325851107,1325851108,1325851109,1325851107,1325851108,1325851109,1325851107,1325851108,1325851109,1325851107,1325851108,1325851109) 
company <- c("Dark.nl","Dark.nl","Dark.nl","Dark.nl","Dark.nl","Dark.nl","Manual.nl","Manual.nl","Manual.nl","Dark.nl","Dark.nl","Dark.nl") 
question <- c("Quality","Quality","Quality","Amount","Amount","Amount","Quality","Quality","Quality","Sense","Sense","Sense") 
score <- c(4,1,2,6,8,10,5,5,7,4,6,7) 
current <- data.frame(resp,company,question,score,answer); current 

#Desired wide dataframe 
resp2 <- c(1325851107,1325851107,1325851108,1325851108,1325851109,1325851109) 
company2 <- c("Dark.nl","Manual.nl","Dark.nl","Manual.nl","Dark.nl","Manual.nl") 
Quality <- c(4,5,1,5,2,7) 
Amount <- c(6,NA,8,NA,10,NA) 
Sense <- c(4,NA,6,NA,7,NA) 
desired <- data.frame(resp2,company2,Quality,Amount,Sense); desired 

#Using dcast function to reshape 
library("reshape2") 
dcast(current, resp + company ~ question, value.var="score") 

Parfait提供的合併函數有效。我在這裏提供了製作技巧的腳本(謝謝Parfait;))。

cols2keep <- c("resp", "company", "score") 
df <- merge(current[current$question=='Quality', cols2keep], #merge two dataframes 
     current[current$question=='Amount', cols2keep], 
     by=c("resp", "company"), all=TRUE) 

df <- merge(df,current[current$question=='Sense',  c("resp","company","score")], #merge third respons variable into new dataframe 
     by=c("resp", "company"), all=TRUE) 
colnames(df) <- c("resp","company","quality","amount","sense") 

該解決方案有效,但我的真實數據集存在53個響應變量。因此這種方法非常耗時。我嘗試了Parfait的迭代方法,但是我得到以下錯誤。

dfList <- lapply(unique(current$question), function(i){ 
temp <- setNames(current[current$question==i, c("resp", "company", "score")], 
       c("resp", "company", paste0(i))) 
}) 

finaldf <- Reduce(function(...) merge(..., y=c("resp", "company"), all=T), dfList) 
Error in fix.by(by.x, x) : 
'by' must specify one or more columns as numbers, names or logical 

我對R編碼比較陌生,無法掌握我寫的錯誤。我對現在的解決方案感到滿意,但如果有更高效的解決方案,我願意接受。

回答

1

考慮在過濾的子集的合併:

cols2keep <- c("resp", "company", "score", "answer") 

df <- merge(current[current$question=='Quality', cols2keep], 
      current[current$question=='Amount', cols2keep], 
      by=c("resp", "company"), all=TRUE) 

colnames(df) <- c("resp", "company", "quality", "quality_a", "amount", "amount_a")  
df 

#   resp company quality  quality_a amount amount_a 
# 1 1325851107 Dark.nl  4 Didn't like  6 Maybe 
# 2 1325851107 Manual.nl  5   Fine  NA  <NA> 
# 3 1325851108 Dark.nl  1  Was ok  8  Fine 
# 4 1325851108 Manual.nl  5 No, thank you  NA  <NA> 
# 5 1325851109 Dark.nl  2   Sure  10 Not bad 
# 6 1325851109 Manual.nl  7  Why not  NA  <NA> 

對於多個羣體,如,繼續進行過濾集合並:

df <- merge(df, 
      current[current$question=='Sense',c("resp", "company", "score", "answer")], 
      by=c("resp", "company"), all=TRUE) 

colnames(df) <- c("resp", "company", "quality", "quality_a", "amount", "amount_a", 
        "sense", "sense_a") 
df 
#   resp company quality  quality_a amount amount_a sense sense_a 
# 1 1325851107 Dark.nl  4 Didn't like  6 Maybe  4 Nice 
# 2 1325851107 Manual.nl  5   Fine  NA  <NA> NA <NA> 
# 3 1325851108 Dark.nl  1  Was ok  8  Fine  6  Ok 
# 4 1325851108 Manual.nl  5 No, thank you  NA  <NA> NA <NA> 
# 5 1325851109 Dark.nl  2   Sure  10 Not bad  7  Yuk 
# 6 1325851109 Manual.nl  7  Why not  NA  <NA> NA <NA> 

此外,對於跨問題各級迭代合併,考慮以下因素:

dfList <- lapply(unique(current$question), function(i){ 
    temp <- setNames(current[current$question==i, c("resp", "company", "score", "answer")], 
       c("resp", "company", paste0(i), paste0(i, "_a"))) 
}) 

finaldf <- Reduce(function(...) merge(..., y=c("resp", "company"), all=T), dfList) 
finaldf 
#   resp company Quality  Quality_a Amount Amount_a Sense Sense_a 
# 1 1325851107 Dark.nl  4 Didn't like  6 Maybe  4 Nice 
# 2 1325851107 Manual.nl  5   Fine  NA  <NA> NA <NA> 
# 3 1325851108 Dark.nl  1  Was ok  8  Fine  6  Ok 
# 4 1325851108 Manual.nl  5 No, thank you  NA  <NA> NA <NA> 
# 5 1325851109 Dark.nl  2   Sure  10 Not bad  7  Yuk 
# 6 1325851109 Manual.nl  7  Why not  NA  <NA> NA <NA> 
+0

非常感謝你Parfait。這個腳本很容易使用,併產生我想到的數據框。 – SHW

+0

好聽!樂意效勞。請接受以確認解決方案。快樂的編碼! – Parfait

+0

現在我遇到一些困難時,我的一個分組變量(公司)由兩個以上的級別組成(請參閱我已添加到原始帖子中的附加代碼:#Grouping變量超過兩個級別,包括「Senses」)。我得到這個錯誤:fix.by(by.x,x)中的錯誤:'by'必須指定一個或多個列作爲數字,名稱或邏輯。任何想法這裏出了什麼問題? – SHW

0

使用tidyr,繼任的選項reshape2

library(tidyverse) 

current %>% group_by(resp, company) %>% 
    # join answer and score into a single column to be spread to wide form 
    unite(answer_score, answer, score) %>% 
    spread(question, answer_score) %>% 
    # separate joined columns 
    separate(Amount, c('amount', 'amount_a'), sep = '_', convert = TRUE) %>% 
    separate(Quality, into = c('quality', 'quality_a'), sep = '_', convert = TRUE) 

## Source: local data frame [6 x 6] 
## Groups: resp, company [6] 
## 
##   resp company amount amount_a  quality quality_a 
## *  <dbl> <fctr> <chr> <int>   <chr>  <int> 
## 1 1325851107 Dark.nl Maybe  6 Didn't like   4 
## 2 1325851107 Manual.nl <NA>  NA   Fine   5 
## 3 1325851108 Dark.nl Fine  8  Was ok   1 
## 4 1325851108 Manual.nl <NA>  NA No, thank you   5 
## 5 1325851109 Dark.nl Not bad  10   Sure   2 
## 6 1325851109 Manual.nl <NA>  NA  Why not   7 

而不是使用unite你可以使用nest,但spread荷蘭國際集團名單列目前製造NULL!而非NA s,這需要一點點額外的角力:

current %>% group_by(resp, company, question) %>% 
    nest() %>% 
    spread(question, data) %>% 
    # insert NAs with purrr::`%||%` so Amount will spread nicely 
    mutate(Amount = map(Amount, ~.x %||% data_frame(score = NA, answer = NA))) %>% 
    unnest(.sep = '_') 

## # A tibble: 6 × 6 
##   resp company Amount_score Amount_answer Quality_score Quality_answer 
##  <dbl> <fctr>  <dbl>  <fctr>   <dbl>   <fctr> 
## 1 1325851107 Dark.nl   6   Maybe    4 Didn't like 
## 2 1325851107 Manual.nl   NA   NA    5   Fine 
## 3 1325851108 Dark.nl   8   Fine    1   Was ok 
## 4 1325851108 Manual.nl   NA   NA    5 No, thank you 
## 5 1325851109 Dark.nl   10  Not bad    2   Sure 
## 6 1325851109 Manual.nl   NA   NA    7  Why not 
+0

感謝您的回答,alistaire,第一個選項已經做到了! – SHW

+0

只是詢問,我想知道爲什麼你使用這個%>%符號。這個腳本可以工作,但我不確定爲什麼:) – SHW

+0

'%>%'是magrittr包中的_pipe_,它現在被很多包(包括tidyr和dplyr)使用,特別是那些與_tidyverse_,其中管道是一個主要的規則。其基本思想是通過避免嵌套函數調用,大量中間變量或寫入同一變量,並按執行的順序讀取,使代碼更容易閱讀。 [這是一個更好的解釋。](http://r4ds.had.co.nz/pipes.html) – alistaire

相關問題