2016-07-23 58 views
1

數據:用數字替換用戶名和計數訂單

DB <- structure(list(orderItemID = 1:10, CustomerName = structure(c(1L, 
1L, 2L, 3L, 3L, 4L, 4L, 4L, 5L, 6L), .Label = c("Alex", "Bert", 
"Corel", "Dennis", "Edgar", "Fred"), class = "factor"), OrderID = structure(c(5L, 
6L, 1L, 2L, 2L, 8L, 7L, 7L, 4L, 3L), .Label = c("14", "17", "33", 
"56", "58", "62", "89", "9"), class = "factor"), ArticleDescription = structure(c(10L, 
5L, 1L, 7L, 8L, 3L, 4L, 2L, 9L, 6L), .Label = c("Adidas Jacket", 
"Adidas Shoes", "Aesics Shoes", "Boss Jeans", "Lee T-Shirt", 
"Nike Airs", "Nike Shoes", "Puma Backpack", "Puma Socks", "Wrangler Jeans" 
), class = "factor")), .Names = c("orderItemID", "CustomerName", 
"OrderID", "ArticleDescription"), row.names = c(NA, -10L), class = "data.frame") 

預期的結果:

output <- structure(list(orderItemID = 1:10, Name = structure(c(1L, 1L, 
2L, 3L, 3L, 4L, 4L, 4L, 5L, 6L), .Label = c("1", "2", "3", "4", 
"5", "6"), class = "factor"), NumberOfOrders = structure(c(1L, 
2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L), .Label = c("1", "2"), class = "factor"), 
    ArticleDescription = structure(c(10L, 5L, 1L, 7L, 8L, 3L, 
    4L, 2L, 9L, 6L), .Label = c("Adidas Jacket", "Adidas Shoes", 
    "Aesics Shoes", "Boss Jeans", "Lee T-Shirt", "Nike Airs", 
    "Nike Shoes", "Puma Backpack", "Puma Socks", "Wrangler Jeans" 
    ), class = "factor")), .Names = c("orderItemID", "Name", 
"NumberOfOrders", "ArticleDescription"), row.names = c(NA, -10L 
), class = "data.frame") 

早上好!

這次我需要用1開頭的數字替換CustomerName - 同一個名字應該有相同的數字 - 下一個名字應該有下一個更高的數字。此外,OrderID應該通過特定客戶訂購的訂單數來重播 - 在這種情況下,當不同商品的訂單ID相等時,它是一個訂單(例如,Alex做了2個訂單(在他訂購的第一個訂單中「Wrangler牛仔褲「,第二個是」Lee T-Shirt「);丹尼斯也做了2個訂單(第一個訂購了」Aesics Shoes「,第二個訂購了」Boss Jeans「和」Adidas Shoes「)最後,我想繼續使用dplyrArticleDescription不變

+0

請修復您的樣品。他們會拋出錯誤 – Sotos

+0

現在我的最大希望是什麼:/ – Jarvis

回答

0
library(dplyr) 

DB %>% mutate(Name = dense_rank(CustomerName), 
      No.of.Orders=(ifelse(is.na(OrderID !=lag(OrderID)), TRUE, (OrderID !=lag(OrderID)))*1)) %>% 
    group_by(CustomerName) %>% 
mutate(No.of.Orders = cumsum(No.of.Orders)) 
+0

它的工作 - 但最後一行(超出其他解決方案)不是 - 只想刪除CustomerName和OrderID:你有解決方案嗎? – Jarvis

+0

只需選擇你想要的列。在%>%select(orderid,Name,No.of.orders) –

+0

後寫下以下幾乎完美的作品,但仍然向我展示了客戶的名字......爲什麼? 2.如何保存它? – Jarvis

1

的一種方式,

library(dplyr) 
DB %>% 
    mutate(Name = as.integer(as.factor(CustomerName))) %>% 
    group_by(Name) %>% 
    mutate(No.of.Orders = data.table::rleid(OrderID)) %>% 
    select(-c(CustomerName, OrderID)) 

#Source: local data frame [10 x 4] 
#Groups: Name [6] 

# orderItemID ArticleDescription Name No.of.Orders 
#   (int)    (fctr) (int)  (int) 
#1   1  Wrangler Jeans  1   1 
#2   2  Lee T-Shirt  1   2 
#3   3  Adidas Jacket  2   1 
#4   4   Nike Shoes  3   1 
#5   5  Puma Backpack  3   1 
#6   6  Aesics Shoes  4   1 
#7   7   Boss Jeans  4   2 
#8   8  Adidas Shoes  4   2 
#9   9   Puma Socks  5   1 
#10   10   Nike Airs  6   1 
+1

'Name = as.integer(as.factor(CustomerName))'可能比要求data.table調用更簡單。 –

0

你可以很容易地得到名稱爲

number_of_orders <- table(DB$CustomerName) 
name <- rep(1:length(unique(DB$CustomerName)), 
     number_of_orders) 

但我認爲亞歷克斯的建議更好。