2016-08-13 85 views
2

我有一個包含足球位置縮寫的因子列,其中包含大約17個具有220個觀測值的唯一值。我想只有三個因素水平,涵蓋了17個獨特的價值觀。如何在R中分組因子水平

levels(nfldraft$Pos) <- list(Linemen = c("C","OG","OT","TE","DT","DE"), Small_Backs = c("CB","WR","FS"), Big_Backs = c("FB","ILB","OLB","P","QB","RB","SS","WR")) 

是我試過,打印nfldraft$Pos到控制檯顯示了3點因子的水平,但所有的值或者是架線工或Small_Backs和所有其他的人都是NA。我哪裏錯了?謝謝

+0

請顯示可重現的示例和預期輸出 – akrun

+3

因子水平只能是一個層次的向量,而不是一個列表。 – alistaire

+0

WR分爲兩類。 –

回答

2

我提出了一個例子字符向量與所有的縮寫:

my_example <- c("C","OG","OT","TE","DT","DE","CB","WR","FS", 
       "FB","ILB","OLB","P","QB","RB","SS","WR") 
class(my_example) 

[1]我取代所期望的水平及其縮寫「字符」

然後(您也可以在這裏使用gsub或許多不同方法中的任何一種):

my_example[my_example %in% c("C","OG","OT","TE","DT","DE")] <- "Linemen" 
my_example[my_example %in% c("CB","WR","FS")]    <- "Small Backs" 
my_example[my_example %in% c("FB","ILB","OLB","P", 
          "QB","RB","SS","WR")]   <- "Big Backs" 

然後我使它成爲一個因素:

my_example <- as.factor(my_example) 
head(my_example) 
[1] Linemen Linemen Linemen Linemen Linemen Linemen 
Levels: Big Backs Linemen Small Backs 
tail(my_example) 
[1] Big Backs Big Backs Big Backs Big Backs Big Backs Small Backs 
Levels: Big Backs Linemen Small Backs 
class(my_example) 

[1] 「因子」

+1

有關您對所有代碼所做的某些解釋是不錯。很顯然OP不理解因素。 –

+0

由於目標項可能是數據框中的一個因素,因此分配給不同名稱可能更安全。 –

+0

我這樣做了:'> nfldraft $ Pos [%c(「C」,「OG」,「OT」,「TE」,「DT」,「DE」)] < - 「Linemen」 警告消息: 在'[<。。factor'('* tmp *',nfldraft $ Pos%in%c(「C」,「OG」,「OT」,「TE」,: 無效因子水平,NA生成' –