2017-08-27 70 views
-2

我想通過r中的一個熱編碼將因子變量表示爲0和1的值作爲data.frame。如何爲一個熱點編碼因子變量提供3個以上的級別?

在因子變量中,我想只對三個或更多級別的變量執行一次熱編碼。

這是我的R代碼。

german<-read.csv("http://freakonometrics.free.fr/german_credit.csv", header=TRUE) 
F=c(1,2,4,5,7,8,9,10,11,12,13,15,16,17,18,19,20,21) 
for(i in F) german[,i]=as.factor(german[,i]) 
str(german) 
'data.frame': 1000 obs. of 21 variables: 
$ Creditability     : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ... 
$ Account.Balance     : Factor w/ 4 levels "1","2","3","4": 1 1 2 1 1 1 1 1 4 2 ... 
$ Duration.of.Credit..month.  : int 18 9 12 12 12 10 8 6 18 24 ... 
$ Payment.Status.of.Previous.Credit: Factor w/ 5 levels "0","1","2","3",..: 5 5 3 5 5 5 5 5 5 3 ... 
$ Purpose       : Factor w/ 10 levels "0","1","2","3",..: 3 1 9 1 1 1 1 1 4 4 ... 
$ Credit.Amount     : int 1049 2799 841 2122 2171 2241 3398 1361 1098 3758 ... 
$ Value.Savings.Stocks    : Factor w/ 5 levels "1","2","3","4",..: 1 1 2 1 1 1 1 1 1 3 ... 
$ Length.of.current.employment  : Factor w/ 5 levels "1","2","3","4",..: 2 3 4 3 3 2 4 2 1 1 ... 
$ Instalment.per.cent    : Factor w/ 4 levels "1","2","3","4": 4 2 2 3 4 1 1 2 4 1 ... 
$ Sex...Marital.Status    : Factor w/ 4 levels "1","2","3","4": 2 3 2 3 3 3 3 3 2 2 ... 
$ Guarantors      : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ... 
$ Duration.in.Current.address  : Factor w/ 4 levels "1","2","3","4": 4 2 4 2 4 3 4 4 4 4 ... 
$ Most.valuable.available.asset : Factor w/ 4 levels "1","2","3","4": 2 1 1 1 2 1 1 1 3 4 ... 
$ Age..years.      : int 21 36 23 39 38 48 39 40 65 23 ... 
$ Concurrent.Credits    : Factor w/ 3 levels "1","2","3": 3 3 3 3 1 3 3 3 3 3 ... 
$ Type.of.apartment    : Factor w/ 3 levels "1","2","3": 1 1 1 1 2 1 2 2 2 1 ... 
$ No.of.Credits.at.this.Bank  : Factor w/ 4 levels "1","2","3","4": 1 2 1 2 2 2 2 1 2 1 ... 
$ Occupation      : Factor w/ 4 levels "1","2","3","4": 3 3 2 2 2 2 2 2 1 1 ... 
$ No.of.dependents     : Factor w/ 2 levels "1","2": 1 2 1 2 1 2 1 2 1 1 ... 
$ Telephone      : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ... 
$ Foreign.Worker     : Factor w/ 2 levels "1","2": 1 1 1 2 2 2 2 2 1 1 ... 

在這裏,我想對一個熱點編碼超過3個級別的因子變量進行編碼。

例如,擔保人變量有3個等級1,2,3。 因此,我想讓擔保人1,擔保人2和擔保人3的變量只有0,1的值作爲data.frame。

+1

讓我們瞭解您已經對自己的審判。我們不是代碼編寫服務。 – emilliman5

+0

[從分類變量創建新的虛擬變量列]可能的重複(https://stackoverflow.com/questions/3384506/create-new-dummy-variable-columns-from-categorical-variable) – Aramis7d

回答

1

一個dplyr & purrr方法

library(dplyr) 
library(purrr) 

german<-read.csv("http://freakonometrics.free.fr/german_credit.csv", header=TRUE) 

cols <- c(1,2,4,5,7,8,9,10,11,12,13,15,16,17,18,19,20,21) 

map_df(german[, cols], as.factor) %>% 
     select_if(function(x) nlevels(x) >= 2) %>% 
     model.matrix(~. -1, data = .) %>% 
     as.data.frame() 

我會建議你閱讀幫助model.matrix,或other questions from SO on this topic.

相關問題