2016-11-08 13 views
5

我有有這樣的數據DF:拓寬一個數據幀以獲取catogorical列的所有唯一值的收入每月金額中的R

sub = c("X001","X002", "X001","X003","X002","X001","X001","X003","X002","X003","X003","X002") 
month = c("201506", "201507", "201506","201507","201507","201508", "201508","201507","201508","201508", "201508", "201508") 
tech = c("mobile", "tablet", "PC","mobile","mobile","tablet", "PC","tablet","PC","PC", "mobile", "tablet") 
brand = c("apple", "samsung", "dell","apple","samsung","apple", "samsung","dell","samsung","dell", "dell", "dell") 

revenue = c(20, 15, 10,25,20,20, 17,9,14,12, 9, 11) 

df = data.frame(sub, month, brand, tech, revenue) 

我想用子和月爲重點,並得到一個每月爲每個訂閱者排列,顯示該訂閱者在該月的技術和品牌中的唯一值的收入總和。這個例子很簡單,列數少,因爲我有一個龐大的數據集,我決定試着用data.table來做。

我已成功地爲一個catagorical列做到這一點,無論是技術還是品牌使用這樣的:

df1 <- dcast(df, sub + month ~ tech, fun=sum, value.var = "revenue") 

,但我想這樣做對兩個或兩個以上caqtogorical列,到目前爲止,我已經試過這樣:

df2 <- dcast(df, sub + month ~ tech+brand, fun=sum, value.var = "revenue") 

它只是連接兩個catogorical列和總和的唯一值,但我不想這樣。我要爲所有catogorical列的每個獨特值分開列。

我是R的新手,非常感謝您的幫助。

+0

預期產量是多少? – Haboryme

回答

5

(我假設dfdata.table而是一個data.frame就像在你的例子)。此

一個可能的解決方案是首先melt數據,同時保持submonthrevenue作爲鍵。這樣,brandtech將被轉換爲單個變量,其值與每個現有鍵組合相對應。這樣,我們就可以很容易地dcast回來,我們將針對單個列 - 就像你的第一個例子來操作

dcast(melt(df, c(1:2, 5)), sub + month ~ value, sum, value.var = "revenue") 
#  sub month PC apple dell mobile samsung tablet 
# 1: X001 201506 10 20 10  20  0  0 
# 2: X001 201508 17 20 0  0  17  20 
# 3: X002 201507 0  0 0  20  35  15 
# 4: X002 201508 14  0 11  0  14  11 
# 5: X003 201507 0 25 9  25  0  9 
# 6: X003 201508 12  0 21  9  0  0 

按OP的評論,你可以很容易地也通過添加添加前綴公式爲variable。這樣,色譜柱也將被正確訂購。

dcast(melt(df, c(1:2, 5)), sub + month ~ variable + value, sum, value.var = "revenue") 
#  sub month brand_apple brand_dell brand_samsung tech_PC tech_mobile tech_tablet 
# 1: X001 201506   20   10    0  10   20   0 
# 2: X001 201508   20   0   17  17   0   20 
# 3: X002 201507   0   0   35  0   20   15 
# 4: X002 201508   0   11   14  14   0   11 
# 5: X003 201507   25   9    0  0   25   9 
# 6: X003 201508   0   21    0  12   9   0 
+0

謝謝大衛,代碼運行良好。然而,生成的列沒有排序,我現在可能想現在將Tech_,Brand_的前綴添加到它們各自的擴展列,所以我想我現在就可以開始工作。 –

+0

沒問題,只需在該公式中添加'variable',看看我的編輯。 –

相關問題