2016-04-21 81 views
2

我有一個購物車的數據,看起來像下面的示例數據框:收集多列與tidyr

sample_df<-data.frame(
    clientid=1:10, 
    ProductA=c("chair","table","plate","plate","table","chair","table","plate","chair","chair"), 
    QuantityA=c(1,2,1,1,1,1,2,3,1,2), 
    ProductB=c("table","doll","shoes","","door","","computer","computer","","plate"), 
    QuantityB=c(3,1,2,"",2,"",1,1,"",1) 
) 
#sample data frame 
    clientid ProductA QuantityA ProductB QuantityB 
1 1  chair 1   table 3 
2 2  table 2   doll 1 
3 3  plate 1   shoes 2    
4 4  plate 1    
... 
10 10  chair 2   plate 1 

我想將其轉換成不同的格式,這將是這樣的:

#ideal data frame 
    clientid ProductNumber Product Quantity 
1 1  A    chair 1 
2 1  B    table 3 
3 2  A    table 2 
4 2  B    doll 1 
... 
11 6  A    chair 1 
... 
17 10  A    chair 2 
18 10  B    plate 1 

我試圖

library(tidyr) 
sample_df_gather<- sample_df %>% select(clientid, ProductA, ProductB) 
%>% gather(ProductNumber, value, -clientid) %>% filter(!is.na(value)) 

#this gives me 
    clientid ProductNumber value 
1 1  ProductA  chair 
2 2  ProductB  table 
3 3  ProductA  plate 
4 4  ProductB  plate 
... 

不過,我不知道該怎麼數量添加到數據幀。另外,在實際的數據框架中,還有更多的欄目,例如標題,價格,我希望將其轉換爲理想的數據框架。有沒有辦法將數據轉換爲理想的格式?

+0

對於QuantityB,你真的不想用「」......試試NA。 – Frank

+1

'reshape(sample_df,dir ='long',vary = list(c(2,4),c(3,5)))'給了我20行或是錯誤的 – rawr

+1

謝謝@Frank!這裏提供的重塑功能解決了我的問題。 @aosmith,是的,在我問這個問題之前,我已經檢查過它,但仍然無法找到一種方法將我轉換爲理想的數據框架。 –

回答

6

隨着data.table:

library(data.table) 
res = melt(setDT(sample_df), 
    measure.vars = patterns("^Product", "^Quantity"), 
    variable.name = "ProductNumber") 
res[, ProductNumber := factor(ProductNumber, labels = c("A","B"))] 

這給

clientid ProductNumber value1 value2 
1:  1    A chair  1 
2:  2    A table  2 
3:  3    A plate  1 
4:  4    A plate  1 
5:  5    A table  1 
6:  6    A chair  1 
7:  7    A table  2 
8:  8    A plate  3 
9:  9    A chair  1 
10:  10    A chair  2 
11:  1    B table  3 
12:  2    B  doll  1 
13:  3    B shoes  2 
14:  4    B  NA  NA 
15:  5    B  door  2 
16:  6    B  NA  NA 
17:  7    B computer  1 
18:  8    B computer  1 
19:  9    B  NA  NA 
20:  10    B plate  1 

數據(因爲OP的原始數據borked):

structure(list(clientid = 1:10, ProductA = structure(c(1L, 3L, 
2L, 2L, 3L, 1L, 3L, 2L, 1L, 1L), .Label = c("chair", "plate", 
"table"), class = "factor"), QuantityA = c(1L, 2L, 1L, 1L, 1L, 
1L, 2L, 3L, 1L, 2L), ProductB = structure(c(6L, 2L, 5L, NA, 3L, 
NA, 1L, 1L, NA, 4L), .Label = c("computer", "doll", "door", "plate", 
"shoes", "table"), class = "factor"), QuantityB = c(3L, 1L, 2L, 
NA, 2L, NA, 1L, 1L, NA, 1L)), .Names = c("clientid", "ProductA", 
"QuantityA", "ProductB", "QuantityB"), row.names = c(NA, -10L 
), class = "data.frame") 
+0

聽起來像OP只對tidyr感興趣,但這可能會引起其他人的興趣。 – Frank