R：將字符串拆分爲數字，並將平均值作爲數據框中的新列返回

我有一個大的數據框，其中的列是一些字符串，如「1,2,3,4」。我想添加一個新的列，這是這些數字的平均值。我已經建立了下面的例子：R：將字符串拆分爲數字，並將平均值作爲數據框中的新列返回

 set.seed(2015) 
    library(dplyr) 
    a<-c("1, 2, 3, 4", "2, 4, 6, 8", "3, 6, 9, 12") 
    df<-data.frame(a) 
    df$a <- as.character(df$a)

現在我可以用strsplit分割字符串，返回均值爲給定行，其中[[1]]指定的第一行。

mean(as.numeric(strsplit((df$a), split=", ")[[1]])) 
    [1] 2.5

問題是當我嘗試在數據框中執行此操作並引用行號時出現錯誤。

> df2<- df %>% 
    + mutate(index = row_number(), 
    +   avg = mean(as.numeric(strsplit((df$a), split=", ") 
    [[index]]))) 
    Error in strsplit((df$a), split = ", ")[[1:3]] : 
     recursive indexing failed at level 2

任何人都可以解釋這個錯誤，爲什麼我不能使用變量索引？如果我用一個常量代替索引，它似乎不喜歡我在那裏使用變量。

非常感謝！

來源

2015-06-16 Daniel Meyer

你可以使用sapply循環通過strsplit返回列表，處理每個列表中的元素：

sapply(strsplit((df$a), split=", "), function(x) mean(as.numeric(x))) 
# [1] 2.5 5.0 7.5

來源

2015-06-16 02:55:54 josliber

嘗試：

library(dplyr) 
library(splitstackshape) 

df %>% 
    mutate(index = row_number()) %>% 
    cSplit("a", direction = "long") %>% 
    group_by(index) %>% 
    summarise(mean = mean(a))

其中給出：如果你想保留的結果你可以一個數據幀

> rowMeans(cSplit(df, "a"), na.rm = T) 
# [1] 2.5 5.0 7.5

：

#Source: local data table [3 x 2] 
# 
# index mean 
#1  1 2.5 
#2  2 5.0 
#3  3 7.5

或按@阿南達的建議做：

df %>% mutate(mean = rowMeans(cSplit(., "a"), na.rm = T))

其中給出：

#   a mean 
#1 1, 2, 3, 4 2.5 
#2 2, 4, 6, 8 5.0 
#3 3, 6, 9, 12 7.5

來源

2015-06-16 02:14:32

library(data.table) 
cols <- paste0("a",1:4) 
setDT(df)[, (cols) := tstrsplit(a, ",", fixed=TRUE, type.convert=TRUE) 
     ][, .(Mean = rowMeans(.SD)), .SDcols = cols] 
    Mean 
1: 2.5 
2: 5.0 
3: 7.5

另外，

rowMeans(setDT(tstrsplit(df$a, ",", fixed=TRUE, type.convert=TRUE))) 
# [1] 2.5 5.0 7.5

來源

2015-06-16 03:10:17 user227710

R：將字符串拆分爲數字，並將平均值作爲數據框中的新列返回

回答

相關問題