2017-09-14 17 views
1

我想了解dplyr管道。該示例與來自ggplot2presidential數據集一起使用。在dplyr中管道 - 如何組合數據

library(ggplot2) 
library(dplyr) 

data("presidential") 
presidential %>% 
    select(name,start,end,party) %>% 
    mutate(time = end - start) %>% 
    group_by(party) %>% 
    mutate(time_per_party = length(time)) -> x 
x 

因此,我計算每個總統,這是工作的時間。現在我想要說明每個黨派總統的時間,但是我得到他們擁有的總統人數。

  name  start  end  party  time time_per_party 
     (chr)  (date)  (date)  (chr) (dfft)   (int) 
1 Eisenhower 1953-01-20 1961-01-20 Republican 2922 days    6 
2  Kennedy 1961-01-20 1963-11-22 Democratic 1036 days    4 
3  Johson 1963-11-22 1969-01-20 Democratic 1886 days    4 
4  Nixon 1969-01-20 1974-08-09 Republican 2027 days    6 
5  Ford 1974-08-09 1977-01-20 Republican 895 days    6 
6  Carter 1977-01-20 1981-01-20 Democratic 1461 days    4 
7  Reagan 1981-01-20 1989-01-20 Republican 2922 days    6 
8  Bush 1989-01-20 1993-01-20 Republican 1461 days    6 
9  Clinton 1993-01-20 2001-01-20 Democratic 2922 days    4 
10  Bush 2001-01-20 2009-01-20 Republican 2922 days    6 

任何想法如何做到這一點? 在最後的結果應該是這樣的:

party  days 
Republican xxx 
Democratic xxx 
+2

'總統%>%mutate(days = difftime(end,start))%>%group_by(party)%>%summarize(days = sum(days)) %>%ungroup()'應該做你想做的。 –

+0

thx @ Z.Lin,最後是**%>%ungroup()**? – Dan

+0

'ungroup()'刪除所有的分組。我喜歡這樣做,因爲在更復雜的用例中,我經常用多個變量進行分組,並且可能會導致意外[剝離]行爲(http://opiateforthemass.es/articles/groupby_summarize/)。所以我寧願習慣明確地取消組合,以知道我在每個點上的位置。 –

回答

2

發現一個sollution到Z.Lin的評論非常simmilar:

presidential %>% mutate(time = end - start) %>% group_by(party) %>% summarise(days = sum(time)) 

做的把戲

1

嘗試

presidential %>% mutate(time = end - start) %>% group_by(party) %>% summarise(days = sum(time))