2012-07-13 65 views
2

我有一個數據重塑問題,我可以使用一些幫助。使指標變量不在列表中

ID   X1   X2   X3   X4   X5 
6001 Certificate Associate Bachelor's Master's Doctoral 
5001 Certificate Associate Bachelor's   
3311 Certificate Associate Bachelor's   
1981 Certificate Associate Bachelor's Master's 
4001 Associate Bachelor's Master's   
2003 Associate Bachelor's Master's Doctoral 
2017 Certificate Associate      
1001 Associate Bachelor's Master's   
5002 Bachelor's 

我需要到這些虛擬變量

ID Certificate  Associates  Bachelor   Master  Doctoral  
6001    1    1    1    1    1 
5001    1    1    1    0    0 
2017    1    1    0    0    0 

有什麼建議?

回答

2

試試reshape2包。我假定你的數據集被稱爲df

require(reshape2) 
# First, melt your data, using 
m.df = melt(df, id.vars="ID") 
# Then `cast` it 
dcast(m.df, ID ~ value, length) 
#  ID Var.2 Associate Bachelor's Certificate Doctoral Master's 
# 1 1001  2   1   1   0  0  1 
# 2 1981  1   1   1   1  0  1 
# 3 2003  1   1   1   0  1  1 
# 4 2017  3   1   0   1  0  0 
# 5 3311  2   1   1   1  0  0 
# 6 4001  2   1   1   0  0  1 
# 7 5001  2   1   1   1  0  0 
# 8 5002  4   0   1   0  0  0 
# 9 6001  0   1   1   1  1  1 

我沒有測試它,但如果你讓你訂購的因素,它可能控制輸出列的順序。

+0

精美的作品!請問,爲什麼長度作爲演員參數? – user1495088 2012-07-13 19:48:45

+0

「長度」是默認值 - 僅計算該組發生的次數。它是'cast'中的列表。要看看它是如何工作的,用''學生'替代'6001'('df [1,3] ='學士''')代替'Associate'。當你融化並重塑形狀時,該行會讀取'0 2 1 1 1'。使用數據的方式,它不應該是一個問題,但它也可能有助於確定是否有任何數據輸入錯誤! – A5C1D2H2I1M1N2O1R2T1 2012-07-13 19:58:51

+0

感謝您的幫助 – user1495088 2012-07-14 21:51:54