2016-07-28 602 views
1

我有R中的數據集,看起來像這樣:基於列x1和x2的值刪除重複的行(基於2列)中的R

x1 x2 x3 
1: A Away 2 
2: A Home 2 
3: B Away 2 
4: B Away 1 
5: B Home 2 
6: B Home 1 
7: C Away 1 
8: C Home 1 

,我想刪除重複的行。我曾嘗試以下:

df[!duplicated(df[,c('x1', 'x2')]),] 

應該刪除行4和6。但不幸的是它不工作,因爲它返回完全相同的數據,與副本仍然存在於數據集。我必須使用什麼來移除第4行和第6行?

+1

相關,但不同:http://stackoverflow.com/q/11792527/ – Frank

回答

1
library("data.table") 
setDT(df)[, .SD[1], by = .(x1, x2)] 

#  x1 x2 x3 
# 1: A Away 2 
# 2: A Home 2 
# 3: B Away 2 
# 4: B Home 2 
# 5: C Away 1 
# 6: C Home 1 
0

,或者您可以使用dplyr

library("dplyr") 
df <- data.frame(x1 = c("A","A","B","B","B","B","C","C"), x2 = c("Away","Home","Away","Away","Home","Home","Away","Home"), x3 = c(2,2,2,1,2,1,1,1)) 

distinct(df,x1,x2,.keep_all = TRUE) 
#  x1 x2 x3 
# 1 A Away 2 
# 2 A Home 2 
# 3 B Away 2 
# 4 B Home 2 
# 5 C Away 1 
# 6 C Home 1 
3

我只是做:

unique(df, by=c("x1", "x2")) # where df is a data.table 

This'd已經相當明顯,如果你只是看着?unique

PS:給出你Q中的語法,我想知道你是否知道data.table和data.frame之間的基本語法差異。我建議你先閱讀vignettes