下面是如何繪製與ggplot一個稀疏曲線的例子。我使用了bioconductor提供的phyloseq包裝中的數據。
安裝phyloseq:
source('http://bioconductor.org/biocLite.R')
biocLite('phyloseq')
library(phyloseq)
其他庫需要
library(tidyverse)
library(vegan)
數據:
mothlist <- system.file("extdata", "esophagus.fn.list.gz", package = "phyloseq")
mothgroup <- system.file("extdata", "esophagus.good.groups.gz", package = "phyloseq")
mothtree <- system.file("extdata", "esophagus.tree.gz", package = "phyloseq")
cutoff <- "0.10"
esophman <- import_mothur(mothlist, mothgroup, mothtree, cutoff)
提取OTU表,轉置,並轉換爲數據幀
otu <- otu_table(esophman)
otu <- as.data.frame(t(otu))
sample_names <- rownames(otu)
out <- rarecurve(otu, step = 5, sample = 6000, label = T)
現在你有一個列表中的每個元素對應一個樣本:
清洗名單了一下:
rare <- lapply(out, function(x){
b <- as.data.frame(x)
b <- data.frame(OTU = b[,1], raw.read = rownames(b))
b$raw.read <- as.numeric(gsub("N", "", b$raw.read))
return(b)
})
標籤列表
names(rare) <- sample_names
轉換成數據幀:
rare <- map_dfr(rare, function(x){
z <- data.frame(x)
return(z)
}, .id = "sample")
讓我們看看它是怎麼樣的:
head(rare)
sample OTU raw.read
1 B 1.000000 1
2 B 5.977595 6
3 B 10.919090 11
4 B 15.826125 16
5 B 20.700279 21
6 B 25.543070 26
與GGPLOT2情節
ggplot(data = rare)+
geom_line(aes(x = raw.read, y = OTU, color = sample))+
scale_x_continuous(labels = scales::scientific_format())
素食主義者陰謀:
rarecurve(otu, step = 5, sample = 6000, label = T) #low step size because of low abundance
一個可以使一個附加列根據這個分組和顏色。
以下是如何添加另一個分組的示例。讓我們假設你有如下形式的表:
groupings <- data.frame(sample = c("B", "C", "D"),
location = c("one", "one", "two"), stringsAsFactors = F)
groupings
sample location
1 B one
2 C one
3 D two
其中樣本根據另一特徵分組。您可以使用lapply
或map_dfr
越過groupings$sample
和標籤rare$location
。
rare <- map_dfr(groupings$sample, function(x){ #loop over samples
z <- rare[rare$sample == x,] #subset rare according to sample
loc <- groupings$location[groupings$sample == x] #subset groupings according to sample, if more than one grouping repeat for all
z <- data.frame(z, loc) #make a new data frame with the subsets
return(z)
})
head(rare)
sample OTU raw.read loc
1 B 1.000000 1 one
2 B 5.977595 6 one
3 B 10.919090 11 one
4 B 15.826125 16 one
5 B 20.700279 21 one
6 B 25.543070 26 one
讓我們做一個像樣的情節了這個
ggplot(data = rare)+
geom_line(aes(x = raw.read, y = OTU, group = sample, color = loc))+
geom_text(data = rare %>% #here we need coordinates of the labels
group_by(sample) %>% #first group by samples
summarise(max_OTU = max(OTU), #find max OTU
max_raw = max(raw.read)), #find max raw read
aes(x = max_raw, y = max_OTU, label = sample), check_overlap = T, hjust = 0)+
scale_x_continuous(labels = scales::scientific_format())+
theme_bw()
哇!這很棘手!現在我明白爲什麼人們認爲R很痛苦使用。沒有一個簡單的方法來做到這一點? –
@Jari Oksanen:來自'vegan'的'rerecurve'函數有'col'和'lty'的參數,可以指定這些參數,如果'ggplot'沒有打算,可以手動添加一個圖例。 – missuse