2012-07-30 64 views
2

嗨,我是R新手,我必須用它來製作維恩圖。我一直在使用Google搜索一段時間,所有的例子,我可以找到處理二進制變量。不過,我有2個列表(其實2個CSV文件)。列表中的項目只是字符串,如PSF113_xxxx。我必須將它們進行比較,以瞭解每種方式的獨特性以及共享的內容。我如何在R中製作一個維恩圖?如何在R中創建維恩圖?

此外文件中沒有相同數量的東西,其中一個略多於另一個,這意味着cbind函數返回錯誤。

到目前爲止,我已經提出了這個問題,但這只是給我一個名爲組1的圖像,裏面有1個,外面是0。

matLis <- list(matrix(A), matrix(B)) 

n <- max(sapply(matLis, nrow)) 
do.call(cbind, lapply(matLis, function (x) 
    rbind(x, matrix(, n-nrow(x), ncol(x))))) 

x = vennCounts(n) 
vennDiagram(x) 

這是我做過數據

2 PSF113_0018 
3 PSF113_0079 
4 PSF113_0079a 
5 PSF113_0079b 

左側的編號是沒有什麼的一個例子,它補充說,當我導入文件分成R從Excel

head(A) 
> head(A) 
      V1 
1 PSF113_0016a 
2 PSF113_0018 
3 PSF113_0079 
4 PSF113_0079a 
5 PSF113_0079b 
6 PSF113_0079c 

> head(b,10) 
      V1 
1 PSF113_0016a 
2 PSF113_0021 
3 PSF113_0048 
4 PSF113_0079 
5 PSF113_0079a 
6 PSF113_0079b 
7 PSF113_0079c 
8 PSF113_0295 
9 PSF113_0324a 
10 PSF113_0324b 
+0

提供一個可重複使用的數據示例會讓您更進一步。 – 2012-07-30 13:34:56

回答

2

由於您沒有定義A或B,因此您的代碼仍然不太可重複。下面是包venneuler中的維恩圖指南,因爲我發現它更靈活。

List1 <- c("apple", "apple", "orange", "kiwi", "cherry", "peach") 
List2 <- c("apple", "orange", "cherry", "tomatoe", "pear", "plum", "plum") 
Lists <- list(List1, List2) #put the word vectors into a list to supply lapply 
items <- sort(unique(unlist(Lists))) #put in alphabetical order 
MAT <- matrix(rep(0, length(items)*length(Lists)), ncol=2) #make a matrix of 0's 
colnames(MAT) <- paste0("List", 1:2) 
rownames(MAT) <- items 
lapply(seq_along(Lists), function(i) { #fill the matrix 
    MAT[items %in% Lists[[i]], i] <<- table(Lists[[i]]) 
}) 

MAT #look at the results 
library(venneuler) 
v <- venneuler(MAT) 
plot(v) 

編輯:頭是因爲它爲我們提供了一些工作非常有幫助。試試這個辦法:

#For reproducibility (skip this and read in the csv files) 
A <- structure(list(V1 = structure(1:6, .Label = c("PSF113_0016a", 
    "PSF113_0018", "PSF113_0079", "PSF113_0079a", "PSF113_0079b", 
    "PSF113_0079c"), class = "factor")), .Names = "V1", 
    class = "data.frame", row.names = c("1", 
    "2", "3", "4", "5", "6")) 

B <- structure(list(V1 = structure(1:10, .Label = c("PSF113_0016a", 
    "PSF113_0021", "PSF113_0048", "PSF113_0079", "PSF113_0079a", 
    "PSF113_0079b", "PSF113_0079c", "PSF113_0295", "PSF113_0324a", 
    "PSF113_0324b"), class = "factor")), .Names = "V1", 
    class = "data.frame", row.names = c("1", 
    "2", "3", "4", "5", "6", "7", "8", "9", "10")) 

運行代碼從這裏:

#after reading in the csv files start here 
Lists <- list(A, B) #put the word vectors into a list to supply lapply 
Lists <- lapply(Lists, function(x) as.character(unlist(x))) 
items <- sort(unique(unlist(Lists))) #put in alphabetical order 
MAT <- matrix(rep(0, length(items)*length(Lists)), ncol=2) #make a matrix of 0's 
colnames(MAT) <- paste0("List", 1:2) 
rownames(MAT) <- items 
lapply(seq_along(Lists), function(i) { #fill the matrix 
    MAT[items %in% Lists[[i]], i] <<- table(Lists[[i]]) 
}) 

MAT #look at the results 
library(venneuler) 
v <- venneuler(MAT) 
plot(v) 

這種方法的區別是,我不公開的兩個數據幀(如果他們dataframes),然後把它們以字符向量。我認爲這應該工作。

+0

A和B只是我導入的csv文件,實際上它們是A = open.csv(...)。如果我打電話給A或B,那麼數據的示例給出我所得到的結果。我會試試這個 – TheFoxx 2012-07-30 13:56:14

+0

試試'head(A,10)'和'head(B)' – 2012-07-30 13:58:03

+0

我從你的代碼中得到一個錯誤消息,說sort函數的數據必須是原子的。你能幫忙嗎?就像我之前說過的,當我打電話給我的數據時,它是以我在OP – TheFoxx 2012-07-30 14:20:57