2015-09-23 106 views
1

我正在開發一個項目,我需要根據人們的投票方式對數據進行排序。我無法找到一個函數,我可以根據滿足的某些條件刪除重複的行。使用R根據特定條件從數據框中刪除重複的行

我正在尋找一個函數,該函數將基於具有重複值的一列和滿足特定條件的另一列刪除重複行。

例如在下表中,我想刪除在三次不同選舉中投票的選民。保羅需要從這個數據框中刪除。

df <- data.frame(Name=c("Paul","Paul","Mary","Bill","Jane","Paul","Mary","John", 
"Bill","John"),ElectionDay=c("November 2010","November 2014", 
"November 2010","November 2010","November 2014","November 2006", 
"November 2014","November 2010","November 2014","November 2014")) 

df 
# Name ElectionDay 
# 1 Paul November 2010 
# 2 Paul November 2014 
# 3 Mary November 2010 
# 4 Bill November 2010 
# 5 Jane November 2014 
# 6 Paul November 2006 
# 7 Mary November 2014 
# 8 John November 2010 
# 9 Bill November 2014 
# 10 John November 2014 

下面是我要尋找的結果的一個例子:

Name ElectionDay 
1 Mary November 2010 
2 Bill November 2010 
3 Jane November 2014 
4 Mary November 2014 
5 John November 2010 
6 Bill November 2014 
7 John November 2014 

回答

6

我們可以使用data.table。我們將'data.frame'轉換爲'data.table'(setDT(df)),按'Name'分組,我們得到唯一的'ElectionDay'長度(uniqueN(ElectionDay))。如果長度小於3,我們得到Data.Table的子集(.SD)。

library(data.table)#v1.9.6+ 
setDT(df)[, if(uniqueN(ElectionDay) < 3) .SD, by = Name] 

類似基R選項將使用ave。我們得到lengthunique'ElectionDay'的元素按'Name'分組,並檢查它是否小於3以獲得邏輯索引。索引可以用於子集數據集的行。

df[with(df, ave(as.character(ElectionDay), Name, 
       FUN=function(x) length(unique(x)))) < 3,] 
# Name ElectionDay 
#3 Mary November 2010 
#4 Bill November 2010 
#5 Jane November 2014 
#7 Mary November 2014 
#8 John November 2010 
#9 Bill November 2014 
#10 John November 2014 
4

發生在超過2行的名稱被計算爲

names(which(table(df$Name) > 2)) 
#[1] "Paul" 

所以,你需要的是

df[!(df$Name %in% names(which(table(df$Name) > 2))), ] 
# Name ElectionDay 
#3 Mary November 2010 
#4 Bill November 2010 
#5 Jane November 2014 
#7 Mary November 2014 
#8 John November 2010 
#9 Bill November 2014 
#10 John November 2014 
+1

或'df [df $%name%in names names(which(table(df $ Name)<3)),]' – Saksham

1

或者你也可以使用dplyr,計數選舉的數每個人投票,然後刪除計數爲3的行:

library(dplyr) 
df %>% 
    group_by(Name) %>% 
    mutate(NumberElections = length(unique(ElectionDay))) %>% 
    ungroup() %>% 
    filter(NumberElections != 3) 
+3

您可以使用'df%>%group_by(Name)%>%filter n_distinct(選舉日)<3)' – akrun