2016-06-30 71 views
0

我有csv格式的數據。重新排列R中的數據用於購物籃分析

數據格式如下。隨着收據號在一列產品在相應的列

Receipt_no Product 
A1 Apple 
A1 Banana 
A1 Orange 
A2 Pineapple 
A2 Jackfruit 
A3 Cola 
A3 Tea 

我想重新排列它們作爲

A1 , Apple, Banana, Orange 
A2 , Pineapple, Jackfruit 
A3 , Cola, Tea 

這是用逗號分隔一行的收據編號和產品名稱。由於數據量很大,我想在R中重新排列相同的數據。

請幫忙

謝謝。

問候, Nithish

+0

請問您是否可以回覆以下任何內容適合您? – mtoto

+0

Soto的回覆很好 – Nithish

回答

0

基地R,

aggregate(Product ~ Receipt_no, df, paste, collapse = ',') 

使用dplyr

df %>% 
    group_by(Receipt_no) %>% 
    summarise(new = paste(Product, collapse = ',')) 
+0

我有750000行數據。我用他建議的baseR函數。我們可以提前估計執行它的時間嗎? – Nithish

+0

我不確定你能做到這一點。有一個稱爲'Sys.time'的函數,但它必須運行該腳本。但是,您將能夠以'data.table'實現最快的速度。你可以很容易地將我的上面的代碼翻譯成'data.table' – Sotos

+0

感謝它的工作! – Nithish

0

使用基R:

u <- as.vector(unique(df$Receipt_no)) 
as.list(sapply(u, function(x) paste0(x, ", ", paste0(subset(df$Product, df$Receipt_no==x), collapse = ", ")))) 

# $A1 
# [1] "A1, Apple, Banana, Orange" 

# $A2 
# [1] "A2, Pineapple, Jackfruit" 

# $A3 
# [1] "A3, Cola, Tea" 

DATA

df <- structure(list(Receipt_no = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 
3L), .Label = c("A1", "A2", "A3"), class = "factor"), Product = structure(c(1L, 
2L, 5L, 6L, 4L, 3L, 7L), .Label = c("Apple", "Banana", "Cola", 
"Jackfruit", "Orange", "Pineapple", "Tea"), class = "factor")), .Names = c("Receipt_no", 
"Product"), class = "data.frame", row.names = c(NA, -7L))