如何查找至少2個向量中常見的元素？

說我有5個載體：如何查找至少2個向量中常見的元素？

a <- c(1,2,3) 
b <- c(2,3,4) 
c <- c(1,2,5,8) 
d <- c(2,3,4,6) 
e <- c(2,7,8,9)

我知道我可以通過使用Reduce()連同intersect()計算所有的人之間的交集，就像這樣：

Reduce(intersect, list(a, b, c, d, e)) 
[1] 2

但我怎麼能找到元素例如，至少有兩個向量是常見的？即：

[1] 1 2 3 4 8

來源

2014-10-03 enricoferrero

它比很多人看起來簡單得多。這應該是非常有效的。

把一切都變成載體：
```
x <- unlist(list(a, b, c, d, e)) 
```

查找重複的

unique(x[duplicated(x)]) 
# [1] 2 3 1 4 8

和sort如果需要的話。

注：如果可以有一個列表元素中的重複（你的例子似乎不牽連），然後用x <- unlist(lapply(list(a, b, c, d, e), unique))

編輯取代x：作爲OP表示有一個更多的利益一般的解決方案，其中n> = 2，我會做：

which(tabulate(x) >= n)

如果數據是僅由天然整數（1,2，...等）作爲例子進行說明。如果不是：

f <- table(x) 
names(f)[f >= n]

現在這是不是來自詹姆斯的解決方案太遠，但它避免了昂貴的十歲上下sort。這比計算所有可能的組合要快得多。

來源

2014-10-03 10:35:34 flodel

不錯的一個。這可以推廣到n> 2嗎？在中，我如何找到至少n個向量中常見的元素？ – enricoferrero 2014-10-03 10:41:33

不，它需要我通過'table'或'tabulate'使用頻率表，請參閱我的編輯。 – flodel 2014-10-03 11:19:28

這是計算中出現的每個唯一值向量的數量的方法

unique_vals <- unique(c(a, b, c, d, e)) 

setNames(rowSums(!!(sapply(list(a, b, c, d, e), match, x = unique_vals)), 
       na.rm = TRUE), unique_vals) 
# 1 2 3 4 5 8 6 7 9 
# 2 5 3 2 1 2 1 1 1

來源

2014-10-03 08:26:39

你可以嘗試所有可能的組合，例如：

## create a list 
l <- list(a, b, c, d) 

## get combinations 
cbn <- combn(1:length(l), 2) 

## Intersect them 
unique(unlist(apply(cbn, 2, function(x) intersect(l[[x[1]]], l[[x[2]]])))) 
## 2 3 1 4

來源

2014-10-03 08:27:58 johannes

你能解釋一下combn（）（1：4）的第一個參數是什麼？ – enricoferrero 2014-10-03 08:34:39

我將它改爲'length（l）'，這是更通用的。當您選擇k時，它會創建n個元素的所有可能組合。 – johannes 2014-10-03 08:37:01

這裏的另一種選擇：

# For each vector, get a vector of values without duplicates 
deduplicated_vectors <- lapply(list(a,b,c,d,e), unique) 

# Flatten the lists, then sort and use rle to determine how many 
# lists each value appears in 
rl <- rle(sort(unlist(deduplicated_vectors))) 

# Get the values that appear in two or more lists 
rl$values[rl$lengths >= 2]

來源

2014-10-03 08:35:49

另一種方法，應用矢量化函數與outer：

L <- list(a, b, c, d, e) 
f <- function(x, y) intersect(x, y) 
fv <- Vectorize(f, list("x","y")) 
o <- outer(L, L, fv) 
table(unlist(o[upper.tri(o)])) 

# 1 2 3 4 8 
# 1 10 3 1 1

上面的輸出給出了載體對共享每個被複制的元件1，2，3，4的數，和8

來源

2014-10-03 09:07:46 jbaums

@rengis方法的變體將是：

unique(unlist(Map(`intersect`, cbn[1,], cbn[2,]))) 
#[1] 2 3 1 4 8

其中，

l <- mget(letters[1:5]) 
cbn <- combn(l,2)

來源

2014-10-03 09:54:34 akrun

如何查找至少2個向量中常見的元素？

回答

相關問題