2012-09-10 51 views
1

假設我有一個因素變量與衆多水平,我試圖將他們分成幾個組。分組變量與衆多水平

> levels(dat$years_continuously_insured_order2) 
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" 
[19] "19" "20" 

> levels(dat$age_of_oldest_driver) 
[1] "-16" "1" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" "32" "33" 
[22] "34" "35" "36" "37" "38" "39" "40 

我有一個腳本,它貫穿這些變量並將它們分成幾個類別。但是,每次我的腳本運行時,級別的數量都可能(通常是)不同。因此,如果我對變量進行分組的原始代碼如下所示(如下所示),如果在一小時之後,我的腳本運行並且級別不同,它將不會有用。現在我可以有25個關卡,而且價值不同,但我仍然需要將它們分組到特定的類別中。

dat$years_continuously_insured2 <- NA 
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[1]] <- NA 
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[2:3]] <- "1 or less" 
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[4]] <- "2" 
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[5:7]] <- "3 +" 
dat$years_continuously_insured2 <- factor(dat$years_continuously_insured2) 

我怎樣才能找到更好的方法來將變量分組爲段? R中有更好的方法嗎?

謝謝!

回答

2

你可以在連續投保變量轉換你的因子水平爲數字,然後切到您的類別和再因子()。第一步在R-FAQ中進行描述(正確地做這是一個兩步過程):

dat$years_cont <- factor(cut( as.numeric(as.character( 
            dat$years_continuously_insured_order2)), 
           breaks=c(0,2,3, Inf), right=FALSE ), 
          labels=c("1 or less", "2", "3 +") 
          ) 
#----------------- 
> str(dat) 
'data.frame': 100 obs. of 2 variables: 
$ years_continuously_insured_order2: Factor w/ 20 levels "1","10","11",..: 4 15 19 5 8 4 16 12 12 18 ... 
$ years_cont      : Factor w/ 3 levels "1 or less","2",..: 3 3 3 3 3 3 3 2 2 3 ... 
0

如果您的原始列是數字,請將其視爲數字,而不是因子。更簡單的方法做你正在做的事情是:

bin.value = function(x) { 
    ifelse(x <= 1, "1 or less", ifelse(x == 2, "2", "3+")) 
} 

dat$years_continuously_insured2 = as.factor(bin.value(as.integer(dat$years_continuously_insured))) 
+0

嘿,無論漂浮你的船。重新考慮後,我改變了使用'ifelse'的解決方案。大多數情況下,我擔心「ifelse」是因爲如果你不得不添加額外的條件,它會變得越來越嵌套和恐嚇(尤其是對於初學者)。 –

+0

我刪除了我的評論。我認爲目前的版本將帶領新手走上R-ououness的道路。 –