2017-06-05 27 views
0

我有一個數據框,顯示年份的出版物數量。但我只對會議和期刊出版物感興趣。我想總結其他類型的所有其他類別。數據幀的按類別使用R的總變量條件

例子:

year type    n  
1994 Conference   2  
1994 Journal    3  
1995 Conference   10  
1995 Editorship   3  
1996 Conference   20  
1996 Editorship   2  
1996 Books and Thesis 3  

其結果將是:

year type    n  
1994 Conference 2  
1994 Journal  3  
1995 Conference 10  
1995 Other   3  
1996 Conference 20  
1996 Other   5  
+0

https://stackoverflow.com/questions/1660124/how-to-sum的可能的複製-a-variable-by-group – akrun

+0

你並沒有總結他人 - 因爲有兩個其他人。你只是想將編輯,書籍和論文重新命名爲他人。或者你想總結髮布的所有內容 –

回答

4

隨着dplyr不是「中國」或「會議」到「其他」,然後等我們可以replace什麼sum他們通過yeartype

library(dplyr) 
df %>% 
    mutate(type = sub("^((Journal|Conference))", "Other", type)) %>% 
    group_by(year, type) %>% 
    summarise(n = sum(n)) 


# year  type  n 
# <int>  <chr> <int> 
#1 1994 Conference  2 
#2 1994 Journal  3 
#3 1995 Conference 10 
#4 1995  Other  3 
#5 1996 Conference 20 
#6 1996  Other  5 
0
levels(df$type)[levels(df$type) %in% c("Editorship", "Books_and_Thesis")] <- "Other" 
aggregate(n ~ type + year, data=df, sum) 

#   type year n 
# 1 Conference 1994 2 
# 2 Journal 1994 3 
# 3  Other 1995 3 
# 4 Conference 1995 10 
# 5  Other 1996 5 
# 6 Conference 1996 20 

輸入數據:

df <- structure(list(year = c(1994L, 1994L, 1995L, 1995L, 1996L, 1996L, 
    1996L), type = structure(c(2L, 3L, 2L, 1L, 2L, 1L, 1L), .Label = c("Other", 
    "Conference", "Journal"), class = "factor"), n = c(2L, 3L, 10L, 
    3L, 20L, 2L, 3L)), .Names = c("year", "type", "n"), row.names = c(NA, -7L), class = "data.frame") 
1

我們可以使用data.table

library(data.table) 
library(stringr) 
setDT(df1)[, .(n = sum(n)), .(year, type = str_replace(type, 
     '(Journal|Conference)', 'Other'))] 
# year    type n 
#1: 1994   Other 5 
#2: 1995   Other 10 
#3: 1995  Editorship 3 
#4: 1996   Other 20 
#5: 1996  Editorship 2 
#6: 1996 Books and Thesis 3