剔除少於兩個因子的變量

我的數據框中的變量包含字符觀察值（不確定這是否是正確的方式來表示這一點，本質上，當我拉起結構時數據被列爲「chr」）。剔除少於兩個因子的變量

我想先把所有東西都轉換成因子，然後檢查一下因子水平。一旦它們成爲因素，我只想繼續使用具有兩個或更多級別的數據框中的變量。

這是我的想法。我知道for循環在R中是一種禁忌，但我很新，對我來說使用它是有道理的。

x = as.character(c("Not Sampled", "Not Sampled", "Y", "N")) 
y = as.character(c("Not Sampled", "Not Sampled", "Not Sampled", "Not Sampled")) 
z = as.character(c("Y", "N", "Not Sampled", "Y")) 
df = data.frame(x, y, z) 

for i in df: 
    df$Response = as.factor(df[,i]) #create new variable in dataframe 
    df$Response = [email protected][sapply .... #where I think I can separate out the variables I want and the variables I don't want 

    m1 = lm(response ~ 1) #next part where I want only the selected variables

我知道解決方案可能要複雜得多，但這是我剛剛起步的嘗試。

來源

2016-03-15 userfriendly

library(dplyr) 

df <- df %>% lapply(factor) %>% data.frame() 
df[ , sapply(df, n_distinct) >= 2]

來源

2016-03-15 19:57:20

哇，這是一個真棒小費，謝謝！ – userfriendly

你不需要dplyr這個lapply方法。（如果你想使用dplyr，你可以使用'mutate_each'） –

默認data.frame方法將字符串轉換爲因素，所以額外的轉換是沒有必要在這種情況下。 lapply比較好，因爲如果長度相同，sapply將盡量簡化矩陣的返回值。

df = data.frame(x, y, z) 

## Already factors, use sapply(df, typeof) to see underlying representation 
sapply(df, class) 
#  x  y  z 
# "factor" "factor" "factor" 

## These are the indicies with > 2 levels 
lengths(lapply(df, levels)) > 2 
# x  y  z 
# TRUE FALSE TRUE 

## Extract only those columns 
df[lengths(lapply(df, levels)) > 2]

來源

2016-03-15 20:02:37 jenesaisquoi

這看起來好像對我有幫助。我試圖複製並粘貼它來測試它，但我不確定「長度」是不同的函數還是它是基本函數「長度」的拼寫錯誤。我99％肯定這是後者，但是爲了後人的緣故，我想澄清一下。 – userfriendly

df[, sapply(df, function(x) length(levels(x)) >= 2)]

來源

2016-03-15 20:07:25 TheRimalaya

剔除少於兩個因子的變量

回答

相關問題