在列表中查找重複項，包括排列

我想確定列表是否包含任何重複元素，同時考慮排列等效。所有載體長度相等。在列表中查找重複項，包括排列

要完成此任務，最有效的方法（最短的運行時間）是什麼？

## SAMPLE DATA 
a <- c(1, 2, 3) 
b <- c(4, 5, 6) 
a.same <- c(3, 1, 2) 

## BOTH OF THSE LISTS SHOULD BE FLAGGED AS HAVING DUPLICATES 
myList1 <- list(a, b, a) 
myList2 <- list(a, b, a.same) 


# CHECK FOR DUPLICATES 
anyDuplicated(myList1) > 0 # TRUE 
anyDuplicated(myList2) > 0 # FALSE, but would like true.

現在我訴諸排序列表的每個成員在檢查重複

anyDuplicated(lapply(myList2, sort)) > 0

我想知道是否有一個更有效的替代方案之前。另外，在?duplicated文檔中，它指示「對列表使用此操作可能會很慢」。還有其他更適合列表的功能嗎？

來源

2012-11-11 Ricardo Saporta

這個怎麼樣...？

a <- c(1, 2, 3) 
b <- c(4, 5, 6) 
a.same <- c(3, 1, 2) 
myList1 <- list(a, b, a) 
myList2 <- list(a, b, a.same) 

# For exact duplicated values: List1 
DF1 <- do.call(rbind, myList1) # From list to data.frame 
ind1 <- apply(DF1, 2, duplicated) # logical matrix for duplicated values 
DF1[ind1] # finding duplicated values 
[1] 1 2 3 

# For permutations: List2 
DF2 <- do.call(rbind, myList2) 
ind2 <- apply(apply(DF2, 1, sort), 1, duplicated) 
DF2[ind2] # duplicated values 
[1] 3 1 2

來源

2012-11-11 19:54:55

我們可以假設矢量長度相等嗎？ – Roland

是的，這裏的假設是矢量具有相同的長度。 –

你可以使用setequal：

myList1 <- list(a, b, a) 
myList2 <- list(a, b, a.same) 
myList3 <- list(a,b) 

test1 <- function(mylist) anyDuplicated(lapply(mylist, sort)) > 0 

test1(myList1) 
#[1] TRUE 
test1(myList2) 
#[1] TRUE 
test1(myList3) 
#[1] FALSE 

test2 <- function(mylist) any(combn(length(mylist),2, 
          FUN=function(x) setequal(mylist[[x[1]]],mylist[[x[2]]]))) 

test2(myList1) 
#[1] TRUE 
test2(myList2) 
#[1] TRUE 
test2(myList3) 
#[1] FALSE 

library(microbenchmark) 

microbenchmark(test1(myList2),test2(myList2)) 
#Unit: microseconds 
#   expr  min  lq median  uq  max 
#1 test1(myList2) 142.256 150.9235 154.6060 162.8120 247.351 
#2 test2(myList2) 63.306 70.5355 73.8955 79.5685 103.113

來源

2012-11-11 19:49:37 Roland

好的建議。不幸的是，它適用於較小的列表，但對於較大的列表來說效率不高。
LargeList < - lapply（REP（100,30），樣品，80，F） smallList < - lapply（REP（4,4），樣品，3，F）微基準（TEST1（smallList），TEST2 （smallList），times = 300） microbenchmark（test1（LargeList），test2（LargeList），times = 300） –

感謝您的微博提示！ –

-3

a=[1,2,3] 
    b=[4,5,6] 
    samea=[3,2,1] 

list1=list(a+b+a) and list(a+b+sames) both of this will create a list with same element 
    [1, 2, 3, 4, 5, 6, 3, 2, 1] 

    ####so finding duplicate Function 

    def findDup(x): 
     for i in x: 
       if x.count(i)>1: return True 

     return False

來源

2012-11-11 20:02:58 raton

請注意標籤r。 OP需要使用[R]（http://www.r-project.org/）的解決方案。 – Roland

在列表中查找重複項，包括排列

回答

相關問題