2017-04-07 33 views
0

我想知道是否存在一個簡單的函數(類似於drop.levels)從包含一個觀察因子的因子中刪除水平。我將在下面提供一個可重現的例子。到目前爲止,我只能通過一次觀察就能夠存儲包含關卡的因素名稱,但編寫所有代碼以放棄特定關卡將是一件痛苦的事情,有沒有一些快捷方式可以實現?包含單一觀測的下降因子水平

db0 <- data.frame(let = c(sample(letters[1:5], 99, replace = T),"z"), 
        let2 = sample(letters[6:11], 100, replace = T)) 

#Checking which factor has levels with only one obs 
facLevels <- lapply(db0, table) 
facNames <- list() 
for(i in 1:length(facLevels)){ 
    facNames[i]<-ifelse(min(facLevels[[i]])==1, names(facLevels[i]), NA) 
} 
facNames <- as.character(facNames[!is.na(facNames)]) 

基本上我想要做的就是放下讓z的水平。 謝謝。

+2

究竟你「下降的Z級」是什麼意思?你想從你的數據中刪除該行嗎?所以你想把這個值設置爲NA而不是z? – MrFlick

+0

是的,將該行設置爲na將是一個解決方案,因爲我可以很容易地將其刪除。請記住,我有許多關卡因素,並且我不知道哪些關卡包含單一觀察結果,所以我選擇這種方法而不是手動進行。 –

回答

0

這裏的for循環將設置任意因子級別,其中一個觀察值爲NA,然後通過重構從列中完全刪除該因子級別。

db0 <- data.frame(let = c(sample(letters[1:5], 99, replace = T),"z"), 
    let2 = sample(letters[6:11], 100, replace = T)) 

#Checking which factor has levels with only one obs 
facLevels <- lapply(db0, table) 
# make a list for each factor level that has one value 
to_change <- lapply(facLevels, function(x) names(x)[x==1]) 

for(i in 1:ncol(db0)){ 
    if(length(to_change[[i]])>0){ 
    # set as NA 
    db0[which(db0[,i] %in% to_change[[i]]),i] <- NA 
    # removes the factor level, remove the code below if this is not what 
    # what you wanted to do 
    db0[,i] <- as.factor(db0[,i]) 
    } 
} 

> tail(db0) 
    let let2 
95  b i 
96  a g 
97  c k 
98  d j 
99  d f 
100 <NA> j 

> levels(db0[,i]) 
[1] "f" "g" "h" "i" "j" "k" 
+0

謝謝,這就是我一直在尋找的 –

0

而如果你不喜歡寫循環

# create a sample dataset 
db0 <- data.frame(let1 = c(sample(letters[1:5], 99, replace = T),"z"), 
        let2 = sample(letters[6:11], 100, replace = T)) 

# calculate how many times each level is present 
facLevel <- lapply(db0, table) 

# drop levels which are present once 
test <- sapply(facLevel, function(x) x[x != 1]) 

# drop rows in the original dataset where a unique level is present (do this for both columns) 
db1 <- db0[rowSums(mapply(function(x, y) x %in% names(y), db0, test)) == ncol(db0), ]