2014-04-04 96 views
6

我剛剛開始在ggplot2中使用geom_map函數。在閱讀了我在這裏找到的關於geom_map的29篇文章後,我仍然遇到了同樣的問題。如何讓geom_map顯示地圖的所有部分?

我的數據框是可笑的大,包含超過2000行。它基本上是由世界衛生組織編輯的特定基因數據(TP53)。請致電here

標題如下所示:

> head(ARCTP53_SOExample) 
    Mutation_ID MUT_ID hg18_Chr17_coordinates hg19_Chr17_coordinates ExonIntron Genomic_nt Codon_number 
1   16 1789    7519192    7578467  5-exon  12451   155 
2   13 1741    7519200    7578475  5-exon  12443   152 
3   17 2143    7519131    7578406  5-exon  12512   175 
4   14 2143    7519131    7578406  5-exon  12512   175 
5   15 2168    7519128    7578403  5-exon  12515   176 
6   12 3737    7517845    7577120  8-exon  13798   273 
    Description c_description g_description  g_description_hg18 WT_nucleotide Mutant_nucleotide 
1   A>G  c.463A>G g.7578467T>C NC_000017.9:g.7519192T>C   A     G 
2   C>T  c.455C>T g.7578475G>A NC_000017.9:g.7519200G>A   C     T 
3   G>A  c.524G>A g.7578406C>T NC_000017.9:g.7519131C>T   G     A 
4   G>A  c.524G>A g.7578406C>T NC_000017.9:g.7519131C>T   G     A 
5   G>T  c.527G>T g.7578403C>A NC_000017.9:g.7519128C>A   G     T 
6   G>A  c.818G>A g.7577120C>T NC_000017.9:g.7517845C>T   G     A 
    Splice_site CpG_site   Type Mut_rate WT_codon Mutant_codon WT_AA Mutant_AA ProtDescription 
1   no  no  A:T>G:C 0.170  ACC   GCC Thr  Ala   p.T155A 
2   no  yes G:C>A:T at CpG 1.243  CCG   CTG Pro  Leu   p.P152L 
3   no  yes G:C>A:T at CpG 1.280  CGC   CAC Arg  His   p.R175H 
4   no  yes G:C>A:T at CpG 1.280  CGC   CAC Arg  His   p.R175H 
5   no  no  G:C>T:A 0.054  TGC   TTC Cys  Phe   p.C176F 
6   no  yes G:C>A:T at CpG 1.335  CGT   CAT Arg  His   p.R273H 
    Mut_rateAA Effect Structural_motif Putative_stop Sample_Name Sample_ID Sample_source Tumor_origin Grade 
1  0.170 missense NDBL/beta-sheets    0 CAS91-19  17  surgery  primary  
2  1.243 missense NDBL/beta-sheets    0  CAS91-4  14  surgery  primary  
3  1.280 missense   L2/L3    0 CAS91-13  12  surgery  primary  
4  1.280 missense   L2/L3    0  CAS91-5  15  surgery  primary  
5  0.054 missense   L2/L3    0  CAS91-1  16  surgery  primary  
6  1.335 missense   L1/S/H2    0  CAS91-3  13  surgery  primary  
    Stage TNM p53_IHC KRAS_status Other_mutations Other_associations 
1    <NA>  <NA>   <NA>     
2    <NA>  <NA>   <NA>     
3    <NA>  <NA>   <NA>     
4    <NA>  <NA>   <NA>     
5    <NA>  <NA>   <NA>     
6    <NA>  <NA>   <NA>     
                   Add_Info Individual_ID Sex Age Ethnicity 
1 Mutation only present in adjacent dysplastic area (Barrett's esophagus)   17 <NA> NA   
2 Mutation only present in adjacent dysplastic area (Barrett's esophagus)   14 <NA> NA   
3 Mutation only present in adjacent dysplastic area (Barrett's esophagus)   12 <NA> NA   
4 Mutation only present in adjacent dysplastic area (Barrett's esophagus)   15 <NA> NA   
5                     16 <NA> NA   
6  Mutation absent from adjacent dysplasia area (Barrett's esophagus)   13 <NA> NA   
    Geo_area Country   Development  Population Region TP53polymorphism Germline_mutation 
1    USA More developed regions Northern America Americas         NA 
2    USA More developed regions Northern America Americas         NA 
3    USA More developed regions Northern America Americas         NA 
4    USA More developed regions Northern America Americas         NA 
5    USA More developed regions Northern America Americas         NA 
6    USA More developed regions Northern America Americas         NA 
    Family_history Tobacco Alcohol Exposure Infectious_agent Ref_ID Cross_Ref_ID PubMed Exclude_analysis 
1     <NA> <NA>  <NA>    <NA>  4   NA 1868473   False 
2     <NA> <NA>  <NA>    <NA>  4   NA 1868473   False 
3     <NA> <NA>  <NA>    <NA>  4   NA 1868473   False 
4     <NA> <NA>  <NA>    <NA>  4   NA 1868473   False 
5     <NA> <NA>  <NA>    <NA>  4   NA 1868473   False 
6     <NA> <NA>  <NA>    <NA>  4   NA 1868473   False 
    WGS_WXS 
1  No 
2  No 
3  No 
4  No 
5  No 
6  No 

在任何情況下,我想創建一個簡單的世界地圖,將色彩的國家,這種突變進行了研究,如果多或更少的「突變簽名」來自這些國家。

如果你看到這一點,你可能會更好地理解我想要做的事:

summary(ARCTP53_SOExample$Country) 
Australia     Brazil     Canada     China 
         1      127      76      519 
     China, Hong-Kong Chinese Taipei (Taiwan)   Czech Republic     Egypt 
        52      36      9      9 
       France     Germany     India     Iran 
        195      10      63      112 
       Ireland     Italy     Japan     Kenya 
        25      30      414      11 
      South Africa     Spain    Switzerland    Thailand 
        13      2      24      35 
     The Netherlands      UK     Uruguay      USA 
         6      17      6      189 
        NA's 
        30 

因此,一些國家多次拿出我的data.frame

原來這就是我得到我想要的地圖,希望做的事:

library(ggplot2) 
library(maps) 
world_map<-map_data("world") 
ggplot(ARCTP53_SOExample)+geom_map(map = world_map, aes(map_id = Country,fill = Country), 
+ colour = "black") + 
+ expand_limits(x = world_map$long, y = world_map$lat) 

而且這是我得到: This map only contains the countries in my list...

有沒有人有什麼我任何輸入米做錯了嗎?

此外,我想在路上做什麼,是將ExonIntron列的geom_bar()添加到不同的國家。但是,我想先嚐試並生成正確的地圖?

感謝一家工廠。

回答

9

丟失的國家的ARC…數據幀丟失==可用於與從world_map數據幀由基本層來補償在地圖上的區域:

library(maps) 

world_map<-map_data("world") 

gg <- ggplot(ARCTP53_SOExample) 

# need one layer with ALL THE THINGS (well, all the regions) 
gg <- gg + geom_map(dat=world_map, map = world_map, 
        aes(map_id=region), fill="white", color="black") 

# now we can put the layer we really want 
gg <- gg + geom_map(map = world_map, 
        aes(map_id = Country, fill = Country), colour = "black") 

gg <- gg + expand_limits(x = world_map$long, y = world_map$lat) 
gg <- gg + theme(legend.position="none") 
gg 

map1

我除去圖例因爲使用choropleth有點假設人們知道地理。

注意:每個區域(國家)使用不同的顏色並不是一個好主意。既然你真的想只有在變異進行了研究亮點,單一的顏色就足夠了:

gg <- ggplot(ARCTP53_SOExample) 
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region), 
        fill="white", color="black") 
gg <- gg + geom_map(map = world_map, aes(map_id = Country), 
        fill = "steelblue", colour = "black") 
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat) 
gg <- gg + theme(legend.position="none") 
gg 

map2

因爲你最終想告訴的ExonIntron的故事,你可能要考慮使用它作爲choropleth的顏色。我對基因一無所知,因此我不知道漸變是否有意義,或者是否有明顯的顏色是要走的路。我假設由下面的代碼創建的過多的不同顏色使我認爲你可能想要爲intronextron做一個梯度比例。再次,我不是基因人。

gg <- ggplot(ARCTP53_SOExample) 
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region), 
        fill="white", color="black") 
gg <- gg + geom_map(map = world_map, aes(map_id = Country, fill = ExonIntron), 
        colour = "black") 
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat) 
gg 

map3

一些顏色或者是在真的很小的區域,或他們是在他們的名字不匹配world_map$region的名字地區。你可能會想看看這個。這:

wm.reg <- unique(as.character(world_map$region)) 
arc.reg <- unique(as.character(ARCTP53_SOExample$Country)) 

arc.reg %in% wm.reg 
## [1] TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE 
## [14] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE TRUE TRUE 

有點顯示有些缺失。

如果你使用圖例來構建你自己的結果表,你可能也想考慮以不同的方式來做圖例(即把它放在底部)。

UPDATE

我差點忘了。

world_map <- subset(world_map, region!="Antarctica") 

gg <- ggplot(ARCTP53_SOExample) 
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region), 
        fill="white", color="black") 
gg <- gg + geom_map(map = world_map, aes(map_id = Country, fill = ExonIntron), 
        colour = "black") 
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat) 
gg <- gg + theme(legend.position="none") 
gg 

map4

(注:既然你(最有可能)沒有必要南極洲,你應該因爲它吃起來有相當多的寶貴空間擺脫它,我擺脫了傳奇因爲我真的覺得你應該重新思考要如何在地圖上的顏色,然後用額外的表或陰謀充當傳說)


最後更新(每OP在下面的評論請求)

library(ggplot2) 
library(maps) 
library(plyr) 
library(gridExtra) 

ARCTP53_SOExample <- read.csv("dat.csv") 

# reduce all the distinct exon/introns to just exon or intron 

ARCTP53_SOExample$EorI <- factor(ifelse(grepl("exon", 
               ARCTP53_SOExample$ExonIntron, 
               ignore.case = TRUE), 
             "exon", "intron")) 

# extract summary data for the two variables we care about for the map 

arc.combined <- count(ARCTP53_SOExample, .(Country, EorI)) 
colnames(arc.combined) <- c("region", "EorI", "ei.ct") 

# get total for country (region) and add to the summary info 

arc.combined <- merge(arc.combined, count(arc.combined, .(region), wt_var=.(ei.ct))) 
colnames(arc.combined) <- c("region", "EorI", "ei.ct", "region.total") 

# it wasn't specified if the "EorI" is going to be used on the map so 
# we won't use it below (but we could, now) 

# get map and intercourse Antarctica 

world_map <- map_data("world") 
world_map <- subset(world_map, region!="Antarctica") 

# this will show the counts by country with all of the "chart junk" removed 
# and the "counts" scaled as a gradient, and with the legend at the top 

gg <- ggplot(arc.combined) 
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region), 
        fill="white", color="#7f7f7f", size=0.25) 
gg <- gg + geom_map(map = world_map, aes(map_id = region, fill = region.total), size=0.25) 
gg <- gg + scale_fill_gradient(low="#fff7bc", high="#cc4c02", name="Tumor counts") 
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat) 
gg <- gg + labs(x="", y="", title="Tumor contribution by country") 
gg <- gg + theme(panel.grid=element_blank(), panel.border=element_blank()) 
gg <- gg + theme(axis.ticks=element_blank(), axis.text=element_blank()) 
gg <- gg + theme(legend.position="top") 
gg 

mapb

# BUT you might want to show the counts by intron/exon by country 
# SO we do a separate map for each factor and combine them 
# with some grid magic. This provides more granular control over 
# each choropleth (in the event one wanted to tweak one or the other) 

# exon 

gg <- ggplot(arc.combined[arc.combined$EorI=="exon",]) 
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region), 
        fill="white", color="#7f7f7f", size=0.25) 
gg <- gg + geom_map(map = world_map, aes(map_id = region, fill = ei.ct), size=0.25) 
gg <- gg + scale_fill_gradient(low="#f7fcb9", high="#238443", name="Tumor counts") 
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat) 
gg <- gg + labs(x="", y="", title="Tumor contribution by 'exon' & country") 
gg <- gg + theme(panel.grid=element_blank(), panel.border=element_blank()) 
gg <- gg + theme(axis.ticks=element_blank(), axis.text=element_blank()) 
gg <- gg + theme(legend.position="top") 
gg.exon <- gg 

# intron 

gg <- ggplot(arc.combined[arc.combined$EorI=="intron",]) 
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region), 
        fill="white", color="#7f7f7f", size=0.25) 
gg <- gg + geom_map(map = world_map, aes(map_id = region, fill = ei.ct), 
        colour = "#7f7f7f", size=0.25) 
gg <- gg + scale_fill_gradient(low="#ece7f2", high="#0570b0", name="Tumor counts") 
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat) 
gg <- gg + labs(x="", y="", title="Tumor contribution by 'intron' & country") 
gg <- gg + theme(panel.grid=element_blank(), panel.border=element_blank()) 
gg <- gg + theme(axis.ticks=element_blank(), axis.text=element_blank()) 
gg <- gg + theme(legend.position="top") 
gg.intron <- gg 

# use some grid magic to combine them into one plot 

grid.arrange(gg.exon, gg.intron, ncol=1) 

mapb

+0

你我親愛的先生是一個絕對的天才!非常感謝。 現在我已經學會了如何構建地圖,當然我還有其他一些問題。首先,正如你在總結(國家)中看到的,一些國家貢獻了比其他人更多的腫瘤(這是總結告訴你的)。如何使用Country變量的「計數」作爲填充相應區域的顏色?另外 - 這是一個語法問題 - 我如何將ExonIntron定義爲只計算爲「Exon」或「Intron」,如您所建議的那樣?非常感謝!真是太棒了! – OFish

+1

完成了。您應該順利地進入choropleth master :-) – hrbrmstr

+1

忘了補充一點,我沒有「解決」你錯誤的國家/地區名稱問題,所以你*真的需要在完成choropleth之前做到這一點 – hrbrmstr