2013-10-23 64 views
0

我有顧客評論,在審計師已通過複製整個檢討,並在一個新行插入每個原因代碼放在多個原因代碼數據的R數據框。下面是我有:重塑[R數據框,獨特的鍵是一個新行

Item Category  Reason     Review 
Vacuum Performance  Bad Suction   I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color. 
Vacuum Design   Cord is too short  I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color. 
Vacuum Color   Wrong Color   I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color. 
Boat Size   too big    The boat was way too big, and was slow. 
Boat Performance  slow     The boat was way too big, and was slow. 
Tube Inflation  low inflation   The tube was not inflated enough 

我被共享列尋找組它(項目和審查)和多重原因和類別創建類別和原因列。假設我提前假設我不知道每個項目的唯一原因和類別的數量,因爲我向您展示了虛擬數據。

所以,我想是這樣的:

Item Category.1 Category.2 Category.3 Reason.1  Reason.2   Reason.3  Review 
Vacuum Performance Design  Color  Bad Suction Cord is too short Wrong Color I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color. 
Boat Size   Performance NA  too big  slow    NA   The boat was way too big, and was slow. 
Tube Inflation  NA    NA  low inflation NA     NA   The tube was not inflated enough 

我用下面的代碼無濟於事嘗試:

reshape(data, direction = "wide", 
     idvar = c("Item", "Review"), 
     timevar = c("Category", "Reason")) 

這裏的數據:

dput(Data) 
structure(list(Item = c("Vacuum", "Vacuum", "Vacuum", "Boat", 
"Boat", "Tube"), Category = c("Performance", "Design", 
"Color", "Size", "Performance", "Inflation" 
), Reason = c("Bad Suction", "Cord is too short", "Wrong Color", 
"too big", "slow", "low inflation"), Review = c("I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.", 
"I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.", 
"I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.", 
"The boat was way too big, and was slow.", "The boat was way too big, and was slow.", 
"The tube was not inflated enough")), .Names = c("Item", "Category", 
"Reason", "Review"), class = "data.frame", row.names = c(NA, 
-6L)) 
+0

你能發佈'dput(data)'的結果,以便我們可以重現您的虛擬數據並自行嘗試? –

+0

當然可以。完成。 – Bryan

回答

1

你只需要在「項目」列中創建一個「時間」變量:

Data$UniqueReview <- ave(Data$Item, Data$Item, FUN = seq_along) 
out <- reshape(Data, direction = "wide", idvar="Item", timevar="UniqueReview") 
names(out) 
# [1] "Item"  "Category.1" "Reason.1" "Review.1" "Category.2" "Reason.2" 
# [7] "Review.2" "Category.3" "Reason.3" "Review.3" 

這裏是「類別」和由此產生的「寬」的數據集(只是,使其適合在屏幕上)「原因」欄。

out[, grep("Item|Category|Reason", names(out))] 
#  Item Category.1  Reason.1 Category.2   Reason.2 Category.3 Reason.3 
# 1 Vacuum Performance Bad Suction  Design Cord is too short  Color Wrong Color 
# 4 Boat  Size  too big Performance    slow  <NA>  <NA> 
# 6 Tube Inflation low inflation  <NA>    <NA>  <NA>  <NA> 

此外,library(reshape)並不是指內置的reshape功能,您正在嘗試使用。相反,這是「reshape2」包的舊版本。


重讀你的問題,你的評論,因爲你可以假設「評論」欄可以作爲自己的ID列進行處理,只是改變相應的reshape命令:

reshape(Data, direction = "wide", idvar=c("Item", "Review"), timevar="UniqueReview") 
#  Item 
# 1 Vacuum 
# 4 Boat 
# 6 Tube 
#                      Review 
# 1 I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color. 
# 4              The boat was way too big, and was slow. 
# 6               The tube was not inflated enough 
# Category.1  Reason.1 Category.2   Reason.2 Category.3 Reason.3 
# 1 Performance Bad Suction  Design Cord is too short  Color Wrong Color 
# 4  Size  too big Performance    slow  <NA>  <NA> 
# 6 Inflation low inflation  <NA>    <NA>  <NA>  <NA> 
+0

它看起來像您的解決方案,使多個審查列('Review.1,Review.2,Review.3') - 是否有辦法做到只有一個審查列?否則,我會摺疊該列。 – Bryan

+0

非常好!我會接受你的回答。 – Bryan