2012-06-08 31 views
1

注意:標題可能會引起誤解。如果您瞭解我的問題並想到更具描述性的內容 - 請更改它。重新編碼缺少字符字段中的數據

我有一個奇怪的情況,一個調查的答案都是字符,而不是數字。看來,R,真的不喜歡這個。假設我提出了一個問題:

Q. In what area do you work? 
East 
West 
Central 
North 
South 
None of the above 

但受訪者只來自東部,西部和中部。

dat <- rep(c("East", "West", "Central"),100) 

現在,爲了演示目的,重要的是我包括北,南,以上都不是,即使他們沒有。然而,考慮到這些因素是具有挑戰性的。

讓我們嘗試:

fac1 <- factor(dat, labels=c("East","West","Central","North","South","None of the above")) 

Error in factor(dat, labels = c("East", "West", "Central", "North", "South", : 
    invalid labels; length 6 should be 1 or 3 

基本上,我想要做的就是因素把這個數據與缺失值。因此,當我輸入類似摘要(fac1)的內容時,它顯示他們在該類別中有0個回覆。

必須有一個更簡單的方法來做到這一點!

回答

3

幾乎在那裏。您需要使用levels參數:

fac1 <- factor(dat, levels=c("East","West","Central","North","South","None of the above")) 
str(fac1) 
Factor w/ 6 levels "East","West",..: 1 2 3 1 2 3 1 2 3 1 ... 

levelslabels之間的區別是這樣的:

  • levels定義你的數據因子水平
  • labels可以讓你重新命名因子水平一氣呵成。

例如:

fac2 <- factor(
    dat, 
    levels=c("East","West","Central","North","South","None of the above"), 
    labels=c("E", "W", "C", "N", "S", "Other") 
) 
str(fac2) 
Factor w/ 6 levels "E","W","C","N",..: 1 2 3 1 2 3 1 2 3 1 ... 
+0

必須留下深刻印象的速度,酒精心:) –

2

不是專家,但是這是任何幫助嗎?

fac1 <- factor(dat, levels = 
       c("East","West","Central","North","South","None of the above")) 
summary(fac1)