在R - 優文庫

中比較兩個列表我有兩個ID列表。在R

我想這兩個列表比較，尤其是我喜歡下面的數字：

名單A和B都多少ID是
多少ID是在A，但不B中
多少ID是在B，而不是在一個

我也很想畫維恩圖。

來源

2013-07-11 Aslan986

見'?? intersect'和'?? setdiff' ... – agstudy

見[維恩圖，其中R？]（http://stackoverflow.com/q/1428946/59470） – topchef

ISN」這是一個在R中不正確地使用術語「list」？這只是兩個向量。這完全不一樣。 – emilBeBri

這裏有一些基本知識，嘗試：

> A = c("Dog", "Cat", "Mouse") 
> B = c("Tiger","Lion","Cat") 
> A %in% B 
[1] FALSE TRUE FALSE 
> intersect(A,B) 
[1] "Cat" 
> setdiff(A,B) 
[1] "Dog" "Mouse" 
> setdiff(B,A) 
[1] "Tiger" "Lion"

同樣，你可以得到數簡稱爲：

> length(intersect(A,B)) 
[1] 1 
> length(setdiff(A,B)) 
[1] 2 
> length(setdiff(B,A)) 
[1] 2

來源

2013-07-11 16:24:48 Mittenchops

然而，一個又一個的方式，在％和布爾向量使用％的共同元素而不是相交和setdiff。我想你實際上要比較兩個載體，而不是兩個名單 - 一個列表是可以包含任何類型元素的R級，而矢量始終包含只是一個類型的元素，因此便於比較什麼是真的平等。這裏的元素被轉換爲字符串，因爲這是當前最不靈活的元素類型。

first <- c(1:3, letters[1:6], "foo", "bar") second <- c(2:4, letters[5:8], "bar", "asd") both <- first[first %in% second] # in both, same as call: intersect(first, second) onlyfirst <- first[!first %in% second] # only in 'first', same as: setdiff(first, second) onlysecond <- second[!second %in% first] # only in 'second', same as: setdiff(second, first) length(both) length(onlyfirst) length(onlysecond) #> both #[1] "2" "3" "e" "f" "bar" #> onlyfirst #[1] "1" "a" "b" "c" "d" "foo" #> onlysecond #[1] "4" "g" "h" "asd" #> length(both) #[1] 5 #> length(onlyfirst) #[1] 6 #> length(onlysecond) #[1] 4 # If you don't have the 'gplots' package, type: install.packages("gplots") require("gplots") venn(list(first.vector = first, second.vector = second))

就像它被提到的那樣，在R中繪製維恩圖有多種選擇。這裏是使用gplots的輸出。

來源

2013-07-11 16:45:36

我通常處理肥胖型套，所以使用一個表，而不是一個文氏圖：

xtab_set <- function(A,B){ 
    both <- union(A,B) 
    inA  <- both %in% A 
    inB  <- both %in% B 
    return(table(inA,inB)) 
} 

set.seed(1) 
A <- sample(letters[1:20],10,replace=TRUE) 
B <- sample(letters[1:20],10,replace=TRUE) 
xtab_set(A,B) 

#  inB 
# inA  FALSE TRUE 
# FALSE  0 5 
# TRUE  6 3

來源

2013-07-11 16:53:02 Frank

啊，我沒有意識到維恩圖包含計數...我認爲他們應該顯示項目本身。 – Frank

隨着sqldf：慢但非常適用於混合數據幀類型：

t1 <- as.data.frame(1:10) 
t2 <- as.data.frame(5:15) 
sqldf1 <- sqldf('SELECT * FROM t1 EXCEPT SELECT * FROM t2') # subset from t1 not in t2 
sqldf2 <- sqldf('SELECT * FROM t2 EXCEPT SELECT * FROM t1') # subset from t2 not in t1 
sqldf3 <- sqldf('SELECT * FROM t1 UNION SELECT * FROM t2') # UNION t1 and t2 

sqldf1 X1_10 
1 
2 
3 
4 
sqldf2 X5_15 
11 
12 
13 
14 
15 
sqldf3 X1_10 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13  
14 
15

來源

2014-06-22 23:32:23 rferrisx

在R

回答

相關問題