2013-02-27 118 views
1

我懷疑自己做錯了,但我想將一個字符向量作爲參數傳遞給ddply中的函數。有很多Q & A刪除引號等,但似乎沒有一個適用於我(例如Remove quotes from a character vector in Rhttp://r.789695.n4.nabble.com/Pass-character-vector-to-function-argument-td3045226.html)。將字符向量作爲參數傳遞給函數plyr

# reproducible data 
df1<-data.frame(a=sample(1:50,10),b=sample(1:50,10),c=sample(1:50,10),d=(c("a","b","c","a","a","b","b","a","c","d"))) 
df2<-data.frame(a=sample(1:50,9),b=sample(1:50,9),c=sample(1:50,9),d=(c("e","f","g","e","e","f","f","e","g"))) 
df3<-data.frame(a=sample(1:50,8),b=sample(1:50,8),c=sample(1:50,8),d=(c("h","i","j","h","h","i","i","h"))) 

#make a list 
list.1<-list(df1=df1,df2=df2,df3=df3) 

# desired output 
lapply(list.1, function(x) ddply(x, .(d), function(x) data.frame(am=mean(x$a), bm=mean(x$b), cm=mean(x$c)))) 

$df1 
    d  am  bm  cm 
1 a 31.00000 29.25000 18.50000 
2 b 31.66667 24.33333 34.66667 
3 c 18.50000 5.50000 24.50000 
4 d 36.00000 39.00000 43.00000 

$df2 
    d  am  bm cm 
1 e 18.25000 32.50000 18 
2 f 27.66667 41.33333 24 
3 g 25.00000 7.50000 42 

$df3 
    d  am  bm  cm 
1 h 36.00000 25.00000 20.50000 
2 i 25.33333 37.33333 24.33333 
3 j 32.00000 32.00000 46.00000 

但我的實際使用情況有很多新的欄目和不同類型的計算,我想在ddply函數來計算。所以,我想要做的事,如:

# here's a simple version of a function that I want to send to ddply  
func <- "am=mean(x$a), bm=mean(x$b), cm=mean(x$c)" 

# here's how I imagine it might work 
lapply(list.1, function(x) ddply(x, .(d), function(x) data.frame(func))) 

# not the desired outcome... 
$df1 
    d          func 
1 a am=mean(x$a), bm=mean(x$b), cm=mean(x$c) 
2 b am=mean(x$a), bm=mean(x$b), cm=mean(x$c) 
3 c am=mean(x$a), bm=mean(x$b), cm=mean(x$c) 
4 d am=mean(x$a), bm=mean(x$b), cm=mean(x$c) 

$df2 
    d          func 
1 e am=mean(x$a), bm=mean(x$b), cm=mean(x$c) 
2 f am=mean(x$a), bm=mean(x$b), cm=mean(x$c) 
3 g am=mean(x$a), bm=mean(x$b), cm=mean(x$c) 

$df3 
    d          func 
1 h am=mean(x$a), bm=mean(x$b), cm=mean(x$c) 
2 i am=mean(x$a), bm=mean(x$b), cm=mean(x$c) 
3 j am=mean(x$a), bm=mean(x$b), cm=mean(x$c) 

我試過noquotedeparseeval(as.symbol())do.call(data.frame, ...)和這裏的一些方法:上funchttps://github.com/hadley/devtools/wiki/Evaluation無濟於事。該解決方案可能會在這一點是顯而易見的,但如果它不是,這裏有一個較長的例子,更接近於我的使用情況下(即融化一切。!):

# sample data 
s <- 23 # number of samples 
r <- 10 # number of runs per sample 
el <- 17 # number of elements 
mydata <- data.frame(ID = unlist(lapply(LETTERS[1:s], function(x) rep(x, r))), 
        run = rep(1:r, s)) 
# insert fake element data 
mydata[letters[1:el]] <- lapply(1:el, function(i) rnorm(s*r, runif(1)*i^2)) 

# generate all combinations of 5 runs from ten runs 
su <- 5 # number of runs to sample from ten runs 
idx <- combn(unique(mydata$run), su) 

# RSE function 
RSE <- function(x) {100*((sd(x)/sqrt(length(x)))/mean(x))} 

# make a list of dfs for all samples for each combination of five runs 
# to prepare to calculate RSEs 
combys1 <- lapply(1:ncol(idx), function(i) mydata[mydata$run %in% idx[,i],]) 

# make a list of dfs with RSE for each ID, for each combination of runs 
combys2 <- lapply(1:length(combys1), function(i) ddply(combys1[[i]], "ID", summarise, RSEa=RSE(a), RSEb=RSE(b), RSEc=RSE(c), meana=mean(a), meanb=mean(b), meanc=mean(c))) 

我想在最後一行上述取代RSEa=RSE(a), RSEb=RSE(b), RSEc=RSE(c), meana=mean(a), meanb=mean(b), meanc=mean(c)從這裏的對象doRSE,以避免大量的打字:

# prepare to calculate new colums with RSE and means 
RSEs <- sapply(3:ncol(mydata), function(j) paste0("RSE",names(mydata[j]))) 
RSExs <- sapply(3:ncol(mydata), function(j) paste0("RSE(",names(mydata[j]),")")) 
doRSE <- paste0(sapply(1:length(RSEs), function(x) paste0(RSEs[x],"=",RSExs[x])), collapse=",", sep="") 

我接受涉及基地,data.table和卑鄙手段解決方案。好像這些都是接近我想要的,但我不能完全把它們翻譯成我的問題: Pass character argument and evaluateForce evaluation of multiple variables using vector of characterUsing a vector of characters that correspond to an expression as an argument to a function

UPDATE這裏的漁獲:我希望能夠修改func在這個簡單的例子中(或者在我的用例中爲doRSE)來創建一系列新列,這些列是通過對現有列進行各種計算得到的,以便探索數據。我想要一個工作流程,允許生成的數據框具有不在原始數據框中的新列。對不起,原來的問題不是很清楚。我看不出如何修改@Marius的答案來做到這一點,但@ mnel's很有幫助(請參閱下面的更新)

通過@ mnel的優秀骯髒技巧工作,並進行了一些小的修復,我可以在我的用例:

# @mnel's solution, adapted (no period before eval) 
combys2 <- lapply(combys1, function(x) do.call(ddply,c(.data = quote(x), 
          .variables = quote(.(ID)), .fun = quote(summarize), 
          eval(parse(text = sprintf('.(%s)', doRSE)))))) 
head(combys2) 

[[1]] 
    ID  RSEa  RSEb  RSEc  RSEd  RSEe  RSEf  RSEg  RSEh  RSEi 
1 A 168.30658 21.68632 5.657228 5.048057 4.162017 2.9581874 1.849009 0.6925148 0.4393491 
2 B 26.55071 26.20427 4.782578 4.385409 2.342764 2.1813874 2.719625 1.1576681 0.6427935 
3 C 73.83165 14.47216 8.154435 6.273202 3.046978 1.2179457 2.811405 1.1401837 0.8167067 
4 D 31.96170 57.89260 9.438220 7.388410 3.755772 0.8601780 3.724875 0.8358204 0.9939387 
5 E 63.22537 60.35532 5.839690 11.691304 3.828430 0.9217787 4.204300 0.8217187 0.7876634 
6 F 56.37635 65.37907 4.149568 5.496308 2.227544 2.1548455 2.847291 1.1956212 0.2506518 
7 G 69.32232 23.63214 4.255847 7.979225 4.917660 1.6185960 3.156521 0.3265555 0.8133279 
8 H 29.82015 40.74184 7.372100 7.464792 2.749862 0.6054420 4.061368 0.9973909 1.3807720 
9 I 50.58114 19.53732 2.989920 9.767678 4.000249 1.7451322 1.175397 0.9952093 0.9095086 
10 J 92.96462 39.77475 6.140688 10.295668 3.407726 2.4663758 3.030444 0.5743419 0.9296482 
11 K 90.72381 42.25092 2.483069 6.781054 3.142082 1.8080633 2.891740 1.1996176 0.8525290 
12 L -385.24547 40.81267 4.506087 8.148382 2.976488 0.8304432 2.234134 0.2108664 0.4979777 
13 M 22.77743 33.98332 2.913926 8.764639 2.307293 0.8366635 3.229944 1.0003125 0.3878567 
14 N 66.75163 34.16087 6.611326 13.865377 1.285522 1.3863958 4.165575 0.7379386 0.4515194 
15 O 37.37188 100.57479 5.738877 5.724862 2.839638 1.1366610 3.186332 0.7383855 0.3954544 
16 P 17.08913 26.62210 6.060130 4.110893 2.688908 2.6970727 1.609043 1.3860834 0.8780010 
17 Q 13.96392 74.92279 5.469304 8.467638 2.974131 1.2135436 3.284564 0.6232778 1.0759226 
18 R 42.59899 30.75952 4.842832 8.764158 1.874020 1.5791048 3.427342 1.4479638 0.2964455 
19 S 26.03307 15.56352 6.968717 7.783876 4.439733 2.0764179 4.683080 0.7459654 1.1268772 
20 T 71.57945 33.81362 7.147049 11.201551 2.128315 2.2051611 2.419805 0.2688807 1.1559635 
21 U 73.93002 11.77155 7.738910 7.207041 1.478491 1.4409844 4.042419 0.5883490 0.5585716 
22 V 67.93166 39.54994 5.701551 8.636122 2.472963 1.6514199 2.627965 1.0359048 0.8747136 
23 W 11.23057 12.51272 7.003448 7.424559 4.102693 0.6614847 2.246305 1.3422405 0.2665246 
     RSEj  RSEk  RSEl  RSEm  RSEn  RSEo  RSEp  RSEq 
1 0.6366733 0.3713819 2.1993487 0.3865293 0.5436581 0.9187585 0.4344699 0.8915868 
2 0.3445095 0.2932025 1.8563179 0.5397595 1.0433388 0.3533622 0.1942316 0.1941072 
3 0.2720344 0.5507595 2.0305726 0.4377259 0.8589854 0.5690906 0.1397337 0.4043247 
4 0.6606667 0.6769112 3.4737352 0.5674656 1.2519256 0.8718298 0.1162969 0.8287504 
5 0.4620774 0.5598069 1.9236112 0.7990046 0.9832732 0.6847352 0.4070675 0.9005185 
6 0.7981610 0.4005493 0.9721068 0.2770989 1.7054674 0.3110139 0.4521183 0.8740444 
7 0.3969116 0.4717575 4.1341106 0.7510628 0.9998299 0.5342292 0.4319642 1.1861705 
8 0.2963956 0.2652221 0.4775827 0.2617120 0.8261874 0.5266087 0.1900943 0.2350553 
9 0.2609359 0.5431035 2.6478440 0.1606919 0.7407281 0.6802262 0.1802069 0.7438792 
10 0.4239787 0.8753544 3.4218030 0.5467869 0.7404017 0.5581173 0.3682014 0.6361436 
11 0.4188502 0.8629862 4.4181479 0.1623873 0.8018811 0.5873609 0.3592134 0.5357984 
12 0.5790265 0.5009210 3.7534287 0.1933726 0.5809601 0.5777868 0.3400925 0.4783890 
13 0.3562582 0.2552756 2.1393219 0.1849345 0.5796194 0.6129469 0.3363311 0.4382125 
14 0.7921502 0.6147990 2.9054634 0.5852325 1.4954072 0.9983203 0.2937837 0.7654504 
15 0.5840424 0.2757707 1.5695675 0.3305385 0.8712636 0.5816490 0.1985457 0.7213289 
16 0.3301280 0.3008273 2.9014987 0.4540833 0.5966479 0.9042004 0.1631630 0.7262141 
17 0.5882511 0.2820978 3.0652666 0.4518936 1.3168151 0.4749311 0.2244693 0.6583083 
18 0.4048816 0.3708787 3.2207478 0.2603412 1.3168318 0.3318745 0.3120436 0.6210711 
19 0.4425123 0.3602076 3.7609863 0.5399527 0.8302572 0.3246904 0.1952143 0.2915325 
20 0.5877835 0.6339015 1.6908570 0.3223056 0.5239339 0.6607198 0.2808094 0.3697380 
21 0.4454056 0.7733354 4.3433420 0.4391075 0.5503594 0.5893406 0.2262403 0.2361512 
22 0.9583940 0.6365843 3.0033951 0.6507968 0.8610046 0.6363198 0.2866719 0.5736855 
23 0.4969730 0.3895182 2.0021608 0.3354475 1.4398250 0.7386870 0.2458906 0.3414804 
... 
... 
+2

我不關注。爲什麼不寫一個計算所有這些新列的函數並在ddply中使用它? – joran 2013-02-27 04:17:01

+0

你能否給我提示該功能的外觀? – Ben 2013-02-27 04:20:34

+1

等等,什麼?如果你不知道這個功能是什麼樣子的,比如它會做什麼,我怎麼能? – joran 2013-02-27 04:23:58

回答

4

你可以做一些難看的計算使用quoteplyr::.

https://github.com/hadley/devtools/wiki/Computing-on-the-language可能會幫助你瞭解是否真的想這樣做的語言。

無論如何,這種方法可以是使用

  1. 使用.()創建你的論點例如矢量和使用如何總結工作

    .(am=mean(a), bm=mean(b), cm=mean(c)) 
    

    如果你真的想用一個字符串

    foo<- "am=mean(a), bm=mean(b), cm=mean(c)" 
    eval(parse(text = sprintf('.(%s)', foo))) 
    
  2. 使用quote libera LLY創建列表要傳遞給到do.call

例如

lapply(list.1, function(x) do.call(ddply,c(.data = quote(x), 
    .variables = quote(.(d)), .fun = quote(summarize), 
     .(am=mean(a), bm=mean(b), cm=mean(c))))) 

哦,孩子是醜陋的。

或者,你可以使用data.tables

library(data.table) 


listDT <- lapply(list.1, data.table) 


lapply(listDT, function(x) x[,lapply(.SD, mean), by = 'd']) 

mystuff <- sprintf('list(%s)', foo) 
lapply(listDT, function(x) x[, eval(parse(text = mystuff)), by = 'd']) 

但是,如果你在所有的data.tables所有相同的列,這將是更有效地創建一個大data.table(對列表中的每個元素都有一個標識符),並在其上進行工作。

+9

你最好希望哈德利沒有看到這一點。 ;) – joran 2013-02-27 04:27:19

+0

+1快速和骯髒的技巧,謝謝!這對我來說很好。這裏似乎有一些錯字,我無法使data.table位工作。 – Ben 2013-02-27 08:57:36

+1

@Ben - 我修好了錯別字 - 好。 – mnel 2013-03-01 00:14:33

2

這裏有一個ddply函數,計算平均值爲所有不在你dataframes d列:

lapply(list.1, 
     function(x) { 
     ddply(
      x, 
      .(d), 
      function(df_part) { 
      result_df <- data.frame(d=df_part$d[1]) 
      non_d_cols <- colnames(df_part)[! colnames(df_part) == "d"] 
      for (col in non_d_cols) { 
       col_mean <- mean(df_part[[col]]) 
       col_name <- paste0(col, "_mean") 
       result_df[[col_name]] <- col_mean 
      } 
      return(result_df) 
      }) 
     }) 

這似乎對我來說,最簡單的方式做到這一點,應該推廣井您可能需要對這些列進行其他計算。也許你可以傳入想要計算平均值的列的字符向量參數,並用它代替non_d_cols

+0

謝謝,這非常有趣,可能會派上用場。 – Ben 2013-02-27 09:17:16

相關問題