2017-11-11 156 views
2
library(data.table) 
library(lubridate) 

x1 <- c(20090101, "2009-01-02", "2009 01 03", "2009-1-4", 
     "2009-1, 5", "Created on 2009 1 6", "200901 !!! 07") 

dt2 <- data.table(id = c(1,1,1,2,2,2,2), date1 = ymd(x1), charval = c("aa","vv","ss","a","b","c","d")) 

    id  date1 charval 
1: 1 2009-01-01  aa 
2: 1 2009-01-02  vv 
3: 1 2009-01-03  ss 
4: 2 2009-01-04  a 
5: 2 2009-01-05  b 
6: 2 2009-01-06  c 
7: 2 2009-01-07  d 

我用下面的代碼通過ID分組:[R崩潰多行到1行使用特定的功能,日期和字符列

dt3 <- dt2[, Map(function(x,y) ifelse(x != "paste", get(x)(y, na.rm = TRUE), paste(y, sep = ";")), 
           setNames(c("mean", "paste"), names(.SD)), .SD), by = id] 

得到的東西是這樣的:

id  date1 charval 
1: 1 2009-01-02  aa;vv;ss 
2: 2 2009-01-05  a;b;c;d 

但實際上我看到下一個結果:

id date1 charval 
1: 1 NA  aa 
2: 2 NA  a 

1)我不明白爲什麼粘貼不工作 2)我不明白爲什麼平均值(日期1)不工作 因爲例如下面的代碼工作正常:

mean(dt2$date1) 
[1] "2009-01-04" 

回答

1

爲什麼我們要經過Map目前尚不清楚和get。通過 'ID' 分組後,得到 '日期1' 和paste的 'charval' 在一起的mean

dt2[, .(date1 = mean(date1), charval = toString(charval)), id] 
# id  date1 charval 
#1: 1 2009-01-02 aa, vv, ss 
#2: 2 2009-01-05 a, b, c, d 

注:toStringpaste(..., collapse=', ')

dt2[, .(date1 = mean(date1), charval = paste(charval, collapse=";")), id] 
# id  date1 charval 
#1: 1 2009-01-02 aa;vv;ss 
#2: 2 2009-01-05 a;b;c;d 

由於OP的問題是關於Map使用get撥打mean。這似乎是觸發

如果(!is.numeric(x)的& &!is.complex(x)的& &!is.logical(X)){ 警告(「的說法是不是數字或邏輯:返回NA「) 回報(NA_real_)

並且當它發現‘日期1’是Date類雖然被存儲爲numeric返回NA。一種選擇是指定envir中的get

另一個問題是使用ifelse。這是更好地使用if/else,因爲只有兩個元素

dt2[, Map(function(x, y) if(x != "paste") get(x, envir = parent.frame())(y, na.rm = TRUE) 
    else paste(y, collapse=':'), setNames(c("mean", "paste"), names(.SD)), .SD), by = id] 
# id  date1 charval 
#1: 1 2009-01-02 aa:vv:ss 
#2: 2 2009-01-05 a:b:c:d 

get是棘手的一種,如果指定正確的環境,它按預期工作

get("mean")(dt2$date1) 
#[1] "2009-01-04" 

或代替if/else到「粘貼」字符串,我們可以查看列class,如果它是character,則執行paste或返回mean

dt2[, Map(function(x, y) if(is.character(y)) get(x)(y, collapse=":") 
    else get(x, envir = parent.frame())(y, na.rm = TRUE), 
    setNames(c("mean", "paste"), names(.SD)), .SD), by = id] 
# id  date1 charval 
#1: 1 2009-01-02 aa:vv:ss 
#2: 2 2009-01-05 a:b:c:d 

請注意,最好使用第一種方法而不用麻煩

相關問題