彙總爲String和總結中的R 3.3.0 Dplyr v關聯來彙總值0.5.0

問：彙總爲String和總結中的R 3.3.0 Dplyr v關聯來彙總值0.5.0

我有以下的數據幀，我想簡化

Fruit <- c("Apple","Apple","Orange","Orange","Banana","Banana") 
Farmer <- c("Bob","Ben","Bill","Bob","George","Bob") 
Tons.Jan <- c(20,40,10,20,35,15) 
Tons.Feb <- c(30,40,20,15,25,30) 
Tons.Mar <- c(10,10,15,10,20,30) 
Tons.Apr <- c(15,20,15,30,30,30) 
Tons.May <- c(20,5,20,20,20,10) 

df <- cbind(Fruit,Farmer) 
df <- cbind(df,Tons.Jan) 
df <- cbind(df,Tons.Feb) 
df <- cbind(df,Tons.Mar) 
df <- cbind(df,Tons.Apr) 
df <- tbl_df(cbind(df,Tons.May))

我希望能夠將農民彙總爲一個以逗號分隔的強壯並將總和Tons與觀察值相加，以使其看起來像下面那樣

我想要得到以下

Fruit2 <- c("Apple","Orange","Banana") 
Farmer2 <- c("Bob,Ben","Bill,Bob","George,Bob") 
Tons.Jan2 <- c(60,30,50) 
Tons.Feb2 <- c(70,35,55) 
Tons.Mar2 <- c(20,25,50) 
Tons.Apr2 <- c(35,45,60) 
Tons.May2 <- c(25,40,30) 

df2 <- cbind(Fruit2,Farmer2) 
df2 <- cbind(df2,Tons.Jan2) 
df2 <- cbind(df2,Tons.Feb2) 
df2 <- cbind(df2,Tons.Mar2) 
df2 <- cbind(df2,Tons.Apr2) 
df2 <- tbl_df(cbind(df2,Tons.May2))

我曾嘗試：

我一直在使用dplyr功能GROUP_BY和summarise_each下面

df <- df %>% group_by(Fruit) %>% 
    summarise_each_(funs(toString))

但是我不知道如何整合相加的數值沒有具體試過使用匯總功能呼叫每列，

任何幫助表示讚賞。

來源

2016-08-31 Leo

library(dplyr) 

# Convert the relevant columns to numeric 
df <- mutate_each(df, funs(as.numeric), -Fruit, -Farmer) 

# or as mentioned in the comments by jazzurro 
df <- mutate_at(df, vars(starts_with("Tons")), as.numeric) 

df %>% 
    group_by(Fruit) %>% 
    mutate(Farmer = toString(Farmer)) %>% 
    group_by(Fruit, Farmer) %>% 
    summarise_all(funs(sum)) 


#Source: local data frame [3 x 7] 
#Groups: Fruit [?] 
# 
# Fruit  Farmer Tons.Jan Tons.Feb Tons.Mar Tons.Apr Tons.May 
# <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> 
#1 Apple Bob, Ben  60  70  20  35  25 
#2 Banana George, Bob  50  55  50  60  30 
#3 Orange Bill, Bob  30  35  25  45  40

來源

2016-08-31 23:43:34 Sumedh

我想你可以用mutate（Farmer = toString（Farmer））來覆蓋Farmer。最後一步可以寫成'summarize_each（funs（sum（。）））'此刻。將來，summarize_each似乎已被棄用。所以我認爲使用summarize_all是一件好事。還有一件事。爲了將字符轉換爲數字，你可以使用'mutate_at（df，vars（starts_with（「Tons」）），as.numeric）'。 – jazzurro

是的，是要覆蓋農民！謝謝！ – Sumedh

這個工程！我用jazzurro的建議來覆蓋Farmer變量。謝謝！ – Leo

最好不要做data.frame(cbind(或tbl_df(cbind作爲cbind結合vector s到一個matrix和矩陣只能容納一個班，所以當我們改變matrix到data.frame（帶即stringsAsFactors=TRUE默認選項），如果有任何字符vector,matrix將所有character類列，並且這變得更糟，因爲列現在factor類與data.frame轉換。所以，我們不必做as.numeric(as.character(更改type的numeric列。這是更好地構建「data.frame」作爲

data.frame(Fruit, Farmer, Tons.Jan, ...)

甲data.table解決方案將是

library(data.table) 
setDT(df)[, Farmer := toString(Farmer), by = Fruit][ , 
    lapply(.SD, function(x) sum(as.numeric(as.character(x)))) , .(Fruit, Farmer)] 
# Fruit     Farmer Tons.Jan Tons.Feb Tons.Mar Tons.Apr Tons.May 
#1: Apple  Bob, Ben, Bob, Ben  60  70  20  35  25 
#2: Orange  Bill, Bob, Bill, Bob  30  35  25  45  40 
#3: Banana George, Bob, George, Bob  50  55  50  60  30

此外，這可以在單個步驟中通過「水果」來進行與分組（基於OP的輸出）

setDT(df)[, c(Farmer = toString(Farmer), lapply(.SD[, 
    setdiff(names(.SD), "Farmer"), with = FALSE], 
     function(x) sum(as.numeric(as.character(x))))), .(Fruit)] 
# Fruit  Farmer Tons.Jan Tons.Feb Tons.Mar Tons.Apr Tons.May 
#1: Apple Bob, Ben  60  70  20  35  25 
#2: Orange Bill, Bob  30  35  25  45  40 
#3: Banana George, Bob  50  55  50  60  30

來源

2016-09-01 04:00:56 akrun

感謝您的提示。我會繼續使用它。 – Leo

彙總爲String和總結中的R 3.3.0 Dplyr v關聯來彙總值0.5.0

回答

相關問題