Q
在R
38
A
回答
58
這裏有一些基本知識,嘗試:
> A = c("Dog", "Cat", "Mouse")
> B = c("Tiger","Lion","Cat")
> A %in% B
[1] FALSE TRUE FALSE
> intersect(A,B)
[1] "Cat"
> setdiff(A,B)
[1] "Dog" "Mouse"
> setdiff(B,A)
[1] "Tiger" "Lion"
同樣,你可以得到數簡稱爲:
> length(intersect(A,B))
[1] 1
> length(setdiff(A,B))
[1] 2
> length(setdiff(B,A))
[1] 2
11
然而,一個又一個的方式,在%和布爾向量使用%的共同元素而不是相交和setdiff。我想你實際上要比較兩個載體,而不是兩個名單 - 一個列表是可以包含任何類型元素的R級,而矢量始終包含只是一個類型的元素,因此便於比較什麼是真的平等。這裏的元素被轉換爲字符串,因爲這是當前最不靈活的元素類型。
first <- c(1:3, letters[1:6], "foo", "bar")
second <- c(2:4, letters[5:8], "bar", "asd")
both <- first[first %in% second] # in both, same as call: intersect(first, second)
onlyfirst <- first[!first %in% second] # only in 'first', same as: setdiff(first, second)
onlysecond <- second[!second %in% first] # only in 'second', same as: setdiff(second, first)
length(both)
length(onlyfirst)
length(onlysecond)
#> both
#[1] "2" "3" "e" "f" "bar"
#> onlyfirst
#[1] "1" "a" "b" "c" "d" "foo"
#> onlysecond
#[1] "4" "g" "h" "asd"
#> length(both)
#[1] 5
#> length(onlyfirst)
#[1] 6
#> length(onlysecond)
#[1] 4
# If you don't have the 'gplots' package, type: install.packages("gplots")
require("gplots")
venn(list(first.vector = first, second.vector = second))
就像它被提到的那樣,在R中繪製維恩圖有多種選擇。這裏是使用gplots的輸出。
16
我通常處理肥胖型套,所以使用一個表,而不是一個文氏圖:
xtab_set <- function(A,B){
both <- union(A,B)
inA <- both %in% A
inB <- both %in% B
return(table(inA,inB))
}
set.seed(1)
A <- sample(letters[1:20],10,replace=TRUE)
B <- sample(letters[1:20],10,replace=TRUE)
xtab_set(A,B)
# inB
# inA FALSE TRUE
# FALSE 0 5
# TRUE 6 3
+0
啊,我沒有意識到維恩圖包含計數...我認爲他們應該顯示項目本身。 – Frank
4
隨着sqldf:慢但非常適用於混合數據幀類型:
t1 <- as.data.frame(1:10)
t2 <- as.data.frame(5:15)
sqldf1 <- sqldf('SELECT * FROM t1 EXCEPT SELECT * FROM t2') # subset from t1 not in t2
sqldf2 <- sqldf('SELECT * FROM t2 EXCEPT SELECT * FROM t1') # subset from t2 not in t1
sqldf3 <- sqldf('SELECT * FROM t1 UNION SELECT * FROM t2') # UNION t1 and t2
sqldf1 X1_10
1
2
3
4
sqldf2 X5_15
11
12
13
14
15
sqldf3 X1_10
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
相關問題
見'?? intersect'和'?? setdiff' ... – agstudy
見[維恩圖,其中R?](http://stackoverflow.com/q/1428946/59470) – topchef
ISN」這是一個在R中不正確地使用術語「list」?這只是兩個向量。這完全不一樣。 – emilBeBri