2016-08-31 129 views
1

問:彙總爲String和總結中的R 3.3.0 Dplyr v關聯來彙總值0.5.0

我有以下的數據幀,我想簡化

Fruit <- c("Apple","Apple","Orange","Orange","Banana","Banana") 
Farmer <- c("Bob","Ben","Bill","Bob","George","Bob") 
Tons.Jan <- c(20,40,10,20,35,15) 
Tons.Feb <- c(30,40,20,15,25,30) 
Tons.Mar <- c(10,10,15,10,20,30) 
Tons.Apr <- c(15,20,15,30,30,30) 
Tons.May <- c(20,5,20,20,20,10) 

df <- cbind(Fruit,Farmer) 
df <- cbind(df,Tons.Jan) 
df <- cbind(df,Tons.Feb) 
df <- cbind(df,Tons.Mar) 
df <- cbind(df,Tons.Apr) 
df <- tbl_df(cbind(df,Tons.May)) 

我希望能夠將農民彙總爲一個以逗號分隔的強壯並將總和Tons與觀察值相加,以使其看起來像下面那樣

我想要得到以下

Fruit2 <- c("Apple","Orange","Banana") 
Farmer2 <- c("Bob,Ben","Bill,Bob","George,Bob") 
Tons.Jan2 <- c(60,30,50) 
Tons.Feb2 <- c(70,35,55) 
Tons.Mar2 <- c(20,25,50) 
Tons.Apr2 <- c(35,45,60) 
Tons.May2 <- c(25,40,30) 

df2 <- cbind(Fruit2,Farmer2) 
df2 <- cbind(df2,Tons.Jan2) 
df2 <- cbind(df2,Tons.Feb2) 
df2 <- cbind(df2,Tons.Mar2) 
df2 <- cbind(df2,Tons.Apr2) 
df2 <- tbl_df(cbind(df2,Tons.May2)) 

我曾嘗試:

我一直在使用dplyr功能GROUP_BY和summarise_each下面

df <- df %>% group_by(Fruit) %>% 
    summarise_each_(funs(toString)) 

但是我不知道如何整合相加的數值沒有具體試過使用匯總功能呼叫每列,

任何幫助表示讚賞。

回答

2
library(dplyr) 

# Convert the relevant columns to numeric 
df <- mutate_each(df, funs(as.numeric), -Fruit, -Farmer) 

# or as mentioned in the comments by jazzurro 
df <- mutate_at(df, vars(starts_with("Tons")), as.numeric) 

df %>% 
    group_by(Fruit) %>% 
    mutate(Farmer = toString(Farmer)) %>% 
    group_by(Fruit, Farmer) %>% 
    summarise_all(funs(sum)) 


#Source: local data frame [3 x 7] 
#Groups: Fruit [?] 
# 
# Fruit  Farmer Tons.Jan Tons.Feb Tons.Mar Tons.Apr Tons.May 
# <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> 
#1 Apple Bob, Ben  60  70  20  35  25 
#2 Banana George, Bob  50  55  50  60  30 
#3 Orange Bill, Bob  30  35  25  45  40 
+1

我想你可以用mutate(Farmer = toString(Farmer))來覆蓋Farmer。最後一步可以寫成'summarize_each(funs(sum(。)))'此刻。將來,summarize_each似乎已被棄用。所以我認爲使用summarize_all是一件好事。還有一件事。爲了將字符轉換爲數字,你可以使用'mutate_at(df,vars(starts_with(「Tons」)),as.numeric)'。 – jazzurro

+0

是的,是要覆蓋農民!謝謝! – Sumedh

+1

這個工程!我用jazzurro的建議來覆蓋Farmer變量。謝謝! – Leo

2

最好不要做data.frame(cbind(tbl_df(cbind作爲cbind結合vector s到一個matrix和矩陣只能容納一個班,所以當我們改變matrixdata.frame(帶即stringsAsFactors=TRUE默認選項),如果有任何字符vector,matrix將所有character類列,並且這變得更糟,因爲列現在factor類與data.frame轉換。所以,我們不必做as.numeric(as.character(更改typenumeric列。這是更好地構建「data.frame」作爲

data.frame(Fruit, Farmer, Tons.Jan, ...) 

data.table解決方案將是

library(data.table) 
setDT(df)[, Farmer := toString(Farmer), by = Fruit][ , 
    lapply(.SD, function(x) sum(as.numeric(as.character(x)))) , .(Fruit, Farmer)] 
# Fruit     Farmer Tons.Jan Tons.Feb Tons.Mar Tons.Apr Tons.May 
#1: Apple  Bob, Ben, Bob, Ben  60  70  20  35  25 
#2: Orange  Bill, Bob, Bill, Bob  30  35  25  45  40 
#3: Banana George, Bob, George, Bob  50  55  50  60  30 

此外,這可以在單個步驟中通過「水果」來進行與分組(基於OP的輸出)

setDT(df)[, c(Farmer = toString(Farmer), lapply(.SD[, 
    setdiff(names(.SD), "Farmer"), with = FALSE], 
     function(x) sum(as.numeric(as.character(x))))), .(Fruit)] 
# Fruit  Farmer Tons.Jan Tons.Feb Tons.Mar Tons.Apr Tons.May 
#1: Apple Bob, Ben  60  70  20  35  25 
#2: Orange Bill, Bob  30  35  25  45  40 
#3: Banana George, Bob  50  55  50  60  30 
+1

感謝您的提示。我會繼續使用它。 – Leo