2016-01-22 39 views
0

我有一個跨多個列具有相同因子值的數據框,我想根據每行中某一列的值數對數據進行排序/子集。基於跨列的相同因子值排序數據幀

df <- data.frame(a = factor(c("yes", "yes", "no", "maybe"), 
levels = c("yes", "no", "maybe")), b = factor(c("maybe", "yes", "yes", "no"), 
levels = c("yes", "no", "maybe")), c = factor(c("maybe", "yes", "yes", "no"), 
levels = c("yes", "no", "maybe")), d = c(1,2,3,4)) 

df 
     a  b  c d 
1 yes maybe maybe 1 
2 yes yes yes 2 
3 no yes yes 3 
4 maybe no no 4 

我想根據每行的所有列出現「yes」次數來對數據進行排序/子集。因此,將「是」出現2次或更多次的所有行(df2),然後(不太重要)根據此排序,其中具有最多「是」值的行位於頂部。如果保留原始行號,這並不重要。

df2 
     a  b  c d 
2 yes yes yes 2 
3 no yes yes 3 

df 
     a  b  c d 
2 yes yes yes 2 
3 no yes yes 3 
1 yes maybe maybe 1 
4 maybe no no 4 

我想過使用order()功能:

df[order(df$a,df$b,df$c), ] 

但這並不返回我想要的。我想我需要使用lapply(),但我不確定使用什麼功能。

回答

4

我們可以使用rowSums這個。

df <- data.frame(a = factor(c("yes", "yes", "no", "maybe"), 
levels = c("yes", "no", "maybe")), b = factor(c("maybe", "yes", "yes", "no"), 
levels = c("yes", "no", "maybe")), c = factor(c("maybe", "yes", "yes", "no"), 
levels = c("yes", "no", "maybe")), d = c(1,2,3,4)) 

df2 <- df[rowSums(df == "yes") >= 2, ] 

df2 
#  a b c d Count 
# 2 yes yes yes 2  3 
# 3 no yes yes 3  2 

這照顧了過濾方法。但是,如果我們還想按「最好」值排序,我們可以先將其設置爲數據中的一列,然後進行篩選和排序,然後刪除列

df$Count <- rowSums(df == "yes") 
df <- df[df$Count >= 2, ] 
df <- df[order(df$Count, decreasing = TRUE), ] 
df <- subset(df, select = -c(Count)) 
df 
#  a b c d 
# 2 yes yes yes 2 
# 3 no yes yes 3