2015-11-30 34 views
-4
Name Address Account a b  Amount Phone 
John CA  4879759 qwqe rerter 203  807789747 
Nil FD  1234455 iuyui jhgjhg 4321 98797897 
Was FR  8979696 yikjh kkjhk 45989 9899999 
Nil FD  1234455 iuyui jhgjhg 4321 98797897 
John CA  4879759 qwqe rerter 203  807789747 
Saw PO  9873279 kjljl bjhjh 765  3543656 
Nil FD  1234455 iuyui jhgjhg 4321 98797897 
Aws IL  707009 dfdsf sasd 2344 242545 
John CA  4879759 qwqe rerter 203  807789747 

我想借助R代碼從此表中抽出重複行。表名是「貸款」。我有170億個項目。主要欄目「姓名,地址,帳戶,金額,電話」。 夥計們,我期待着得到一些積極的解決方案。如何從R中的表中獲取重複行

分離後的一件事情我想以.csv格式保存重複數據表。我與R新,請幫助我也。

+8

參見[這裏](http://stackoverflow.com/questions/25041933),[這裏](http://stackoverflow.com/questions/22959635),[這裏](http://stackoverflow.com/questions/26703764),[這裏](http://stackoverflow.com/questions/12495345),[這裏](http://stackoverflow.com/questions/31933605),[這裏](http://stackoverflow.com/questions/13967063),[這裏](http://stackoverflow.com/questions/24881855/delete -all-duplicated-rows-in-r)和[這裏](http://stackoverflow.com/search?q=%5Br%5D+duplicated+rows),一些鏈接可能會被重複 – zx8754

回答

1

我們可以使用duplicated根據鍵列('nm1')獲取所有重複行。

nm1 <- c("Name", "Address", "Account", "Amount", "Phone") 
df1[duplicated(df1[nm1])|duplicated(df1[nm1], fromLast=TRUE),] 
# Name Address Account  a  b Amount  Phone 
#1 John  CA 4879759 qwqe rerter 203 807789747 
#2 Nil  FD 1234455 iuyui jhgjhg 4321 98797897 
#4 Nil  FD 1234455 iuyui jhgjhg 4321 98797897 
#5 John  CA 4879759 qwqe rerter 203 807789747 
#7 Nil  FD 1234455 iuyui jhgjhg 4321 98797897 
#9 John  CA 4879759 qwqe rerter 203 807789747 
+1

非常感謝你Akrun ..... – Theking

1

的擴展,Akrun的答案,包括關鍵列只在重複檢查:

mainCols = c("Name", "Address", "Account", "Amount", "Phone") 
duplicatedRows = duplicated(loan[,mainCols]) 
duplicatedData = loan[duplicatedRows,] 

# Name Address Account  a  b Amount  Phone 
# 4 Nil  FD 1234455 iuyui jhgjhg 4321 98797897 
# 5 John  CA 4879759 qwqe rerter 203 807789747 
# 7 Nil  FD 1234455 iuyui jhgjhg 4321 98797897 
# 9 John  CA 4879759 qwqe rerter 203 807789747 
相關問題