2017-04-04 59 views
2

這是我目前條形圖: enter image description here整合條形圖數據

我想所有的特朗普數據合併到一個酒吧,和所有的克林頓數據到另一個。

enter image description here

我認爲,基本上我需要計算,獲勝者是特朗普,以及其中的贏家是克林頓的所有值的平均值所有值的平均值,但我我不確定如何做到這一點,因爲我是一個noob。

這裏是我當前的代碼,如果它可以幫助:

library(ggplot2) 

healthd = read.csv("R/states.csv") 


states = healthd[[1]] 
uninsured2015 = healthd[[3]] 
uninsured2015 = abs(as.numeric(as.character(gsub("%","", uninsured2015)))) 
insuredChange = healthd[[4]] 
insuredChange = abs(as.numeric(as.character(gsub("%","", insuredChange)))) 
winner = healthd[[15]] 

ggplot(data = healthd, aes(x = states, y = insuredChange, fill=winner)) + 
xlab("State") + ylab("Percent Uninsured (2015)") + 
scale_fill_manual(values = c("Trump" = "red4", "Clinton" = "blue4")) + 
geom_bar(stat="identity") + 
theme_bw() + 
theme(panel.border = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"), axis.text.x=element_text(angle = 90, hjust = 1)) 

而且,這裏是我的我的數據的頭:

> head(healthd) 
     State Uninsured.Rate..2010. Uninsured.Rate..2015. Uninsured.Rate.Change..2010.2015. 
1 Alabama     14.60%    10.10%       -4.50% 
2 Alaska     19.90%    14.90%        -5% 
3 Arizona     16.90%    10.80%       -6.10% 
4 Arkansas     17.50%     9.50%        -8% 
5 California    18.50%     8.60%       -9.90% 
6 Colorado    15.90%     8.10%       -7.80% 
    Health.Insurance.Coverage.Change..2010.2015. Employer.Health.Insurance.Coverage..2015. 
1          215000         2545000 
2          36000         390000 
3          410000         3288000 
4          234000         1365000 
5          3826000         19552000 
6          419000         2949000 
    Marketplace.Health.Insurance.Coverage..2016. Marketplace.Tax.Credits..2016. 
1          165534       152206 
2          17995       16205 
3          179445       124346 
4          63357       56843 
5          1415428      1239893 
6          108311       67062 
    Average.Monthly.Tax.Credit..2016. State.Medicaid.Expansion..2016. Medicaid.Enrollment..2013. 
1        $310       FALSE      799176 
2        $750        TRUE      122334 
3        $230        TRUE     1201770 
4        $306        TRUE      556851 
5        $309        TRUE     7755381 
6        $318        TRUE      783420 
    Medicaid.Enrollment..2016. Medicaid.Enrollment.Change..2013.2016. Medicare.Enrollment..2016. 
1      910775         111599      989855 
2      166625         44291      88966 
3     1716198         514428     1175624 
4      920194         363343      606146 
5     11843081        4087700     5829777 
6     1375264         591844      820234 
    X2016.Election.Winner 
1     Trump 
2     Trump 
3     Trump 
4     Trump 
5    Clinton 
6    Clinton 
+0

使用facet? '+ facet_wrap(〜贏家)'或'+ facet_grid(〜贏家)' –

+0

如果我這樣做,它說'錯誤+ facet_grid(〜贏家):一元運算符的無效參數和我的圖得到所有搞砸 –

+0

任何反饋爲了我? –

回答

1

你有你的數據首先聚合成一個新的數據幀,並重新繪製它。在R中有很多方法可以做到這一點,但可能dplyr具有易學性,強大功能和編程安全性的最佳組合 - 所以我會使用它。

我空置了一些數據,這裏是代碼:

library(ggplot2) 
library(dplyr) 

n <- 50 
ss <- sprintf("State-%.2d",1:n) 
u15 <- 10*(runif(n) + 0.5) 
icg = 4*(runif(n) + 0.5) 
w = sample(c("Candidate-1","Candidate-2"),n,replace=T) 

healthd <- data.frame(states=ss,uninsured2015=u15,insuredChange=icg,winner=w) 

ggplot(data = healthd, aes(x = states, y = insuredChange, fill=winner)) + 
    xlab("State") + ylab("Percent Uninsured (2015)") + 
    scale_fill_manual(values = c("Candidate-1" = "red4", "Candidate-2" = "blue4")) + 
    geom_bar(stat="identity") + theme_bw() + 
    theme(panel.border = element_blank(), 
     panel.grid.major = element_blank(), 
     panel.grid.minor = element_blank(), 
     axis.line = element_line(colour = "black"), 
     axis.text.x=element_text(angle = 90, hjust = 1)) 

# make a new aggregated dataframe with dplyr 
aghealthd <- healthd %>% group_by(winner) %>% 
         summarise(uninsured2015=mean(uninsured2015), 
            insuredChange=mean(insuredChange)) 

# plot that with the same code, changing only the x-axis 
ggplot(data = aghealthd, aes(x = winner, y = insuredChange, fill=winner)) + 
    xlab("State") + ylab("Percent Uninsured (2015)") + 
    scale_fill_manual(values = c("Candidate-1" = "red4", "Candidate-2" = "blue4")) + 
    geom_bar(stat="identity") + theme_bw() + 
    theme(panel.border = element_blank(), 
     panel.grid.major = element_blank(), 
     panel.grid.minor = element_blank(), 
     axis.line = element_line(colour = "black"), 
     axis.text.x=element_text(angle = 90, hjust = 1)) 

這裏是地塊1:

enter image description here

這裏是情節2:

enter image description here

+0

要獲得準確的全國範圍百分比,您需要根據州的人口數量來計算加權平均值,但目前的百分比可能會出現偏差。我意識到這不在問題或示例數據中,所以它不是對答案的攻擊,而只是一個註釋。 –

+0

是的,也許他的專欄有一個。這是一個很好的觀點,因爲有人會肯定地提出來。 –