功能中的主要dplyr函數

我已經看到了一些關於如何使用dplyr函數編寫自己的函數的文章。例如，您可以看到如何在this post中使用group_by (regroup)和summarise。我認爲看看我是否可以使用主要dplyr函數編寫函數會很有趣。我的希望是我們可以進一步瞭解如何使用dplyr函數編寫函數。功能中的主要dplyr函數

DATA

country <- rep(c("UK", "France"), each = 5) 
id <- rep(letters[1:5], times = 2) 
value <- runif(10, 50, 100) 
foo <- data.frame(country, id, value, stringsAsFactors = FALSE)

目標

我想寫以下過程中的功能。

foo %>% 
    mutate(new = ifelse(value > 60, 1, 0)) %>% 
    filter(id %in% c("a", "b", "d")) %>% 
    group_by(country) %>% 
    summarize(whatever = sum(value))

TRY

### Here is a function which does the same process 

myFun <- function(x, ana, bob, cathy) x %>% 
    mutate(new = ifelse(ana > 60, 1, 0)) %>% 
    filter(bob %in% c("a", "b", "d")) %>% 
    regroup(as.list(cathy)) %>% 
    summarize(whatever = sum(ana)) 

myFun(foo, value, id, "country") 

Source: local data frame [2 x 2] 

    country whatever 
1 France 233.1384 
2  UK 245.5400

你也許會意識到arrange()是不存在的。這是我掙扎的人。這裏有兩點意見。第一個實驗是成功的。這些國家的順序從英法到英法。但第二個實驗沒有成功。

### Experiment 1: This works for arrange() 

myFun <- function(x, ana) x %>% 
     arrange(ana) 

myFun(foo, country) 

    country id value 
1 France a 90.12723 
2 France b 86.64229 
3 France c 74.93320 
4 France d 80.69495 
5 France e 72.60077 
6  UK a 84.28033 
7  UK b 67.01209 
8  UK c 94.24756 
9  UK d 79.49848 
10  UK e 63.51265 


### Experiment2: This was not successful. 

myFun <- function(x, ana, bob) x %>% 
     filter(ana %in% c("a", "b", "d")) %>% 
     arrange(bob) 

myFun(foo, id, country) 

Error: incorrect size (10), expecting :6 

### This works, by the way. 
foo %>% 
filter(id %in% c("a", "b", "d")) %>% 
arrange(country)

鑑於第一個實驗是成功的，我很難理解第二個實驗失敗的原因。在第二次實驗中可能有一件事需要做。有沒有人有想法？感謝您抽出時間。

來源

2014-09-23 jazzurro

實際上，您的實驗無法正常工作，您將面臨所有問題的範圍問題。看起來他們正在工作，因爲您已經在Global Environment上定義了矢量country，id和value，並且未將其移除。所以當你調用你的函數時，他們正在使用全球環境中的矢量。

爲了證明這一點，讓我們呼喚你的函數之前刪除這些載體：

創建載體和data.frame：

library(dplyr) 
country <- rep(c("UK", "France"), each = 5) 
id <- rep(letters[1:5], times = 2) 
value <- runif(10, 50, 100) 
foo <- data.frame(country, id, value, stringsAsFactors = FALSE)

定義你的第一個功能：

myFun <- function(x, ana, bob, cathy) x %>% 
    mutate(new = ifelse(ana > 60, 1, 0)) %>% 
    filter(bob %in% c("a", "b", "d")) %>% 
    regroup(as.list(cathy)) %>% 
    summarize(whatever = sum(ana))

無需調用去除矢量（它看起來像是有效的，但它實際上是使用來自全局env的矢量）：

在mutate_impl

rm(country, id, value) 
myFun(foo, value, id, "country")

錯誤（。數據，named_dots（：
myFun(foo, value, id, "country") 
Source: local data frame [2 x 2] 

    country whatever 
1 France 208.1008 
2  UK 192.4287 
現在去除載體和調用你的函數（現在它不工作，因爲它無法找到矢量） ...），環境（））：
對象「價值」沒有找到

所以這解釋了爲什麼而其他沒有你的安排例子沒有工作。第二個實驗所調用的矢量是全球環境矢量country，其中有10個元素。但函數安排只需要6個元素，這是濾波矢量的結果。

你有不同的策略，使您的功能工作。例如，請查看t以瞭解如何執行此操作。或只是等待一點點，如哈德利指出，programming in dplyr is a future feature coming soon.

來源

2014-09-23 17:38:49

Deparsing和粘貼字符串是_永遠_寫答案。 – hadley 2014-09-23 22:49:48

@hadley ok，在這種情況下，您會推薦「創建列表」方法？ – 2014-09-23 22:53:15

我推薦使用'substitute（）'，或者等待https://github.com/hadley/dplyr/issues/352 – hadley 2014-09-23 22:54:15

我安裝dplyr 0.3和lazyeval一次issue 352被關閉，看看它如何工作在其他功能使用dplyr功能。看完vignette on non-standard evaluation後，看起來好像interp從lazyeval結合以_結尾的新功能是一種選擇。注意group_by_現在替換regroup。

set.seed(16) 
foo = data.frame(country = rep(c("UK", "France"), each = 5), 
       id = rep(letters[1:5], times = 2), 
       value = runif(10, 50, 100), stringsAsFactors = FALSE)

首先代碼/函數外的結果：

library(lazyeval) 
library(dplyr) 

foo %>% 
    mutate(new = ifelse(value > 60, 1, 0)) %>% 
    filter(id %in% c("a", "b", "d")) %>% 
    group_by(country) %>% 
    summarize(whatever = sum(value)) 

Source: local data frame [2 x 2] 

    country whatever 
1 France 213.0009 
2  UK 207.8331

然後轉動上述處理成一個函數：

myFun = function(x, ana, bob, cathy) { 
    x %>% 
     mutate_(new = interp(~ifelse(var > 60 , 1, 0), var = as.name(ana))) %>% 
     filter_(interp(~var %in% c("a", "b", "d"), var = as.name(bob))) %>% 
     group_by_(cathy) %>% 
     summarize_(whatever = interp(~sum(var), var = as.name(ana))) 
}

其中給出所需的結果。

myFun(foo, "value", "id", "country") 
Source: local data frame [2 x 2] 

    country whatever 
1 France 213.0009 
2  UK 207.8331

爲了您與arrange秒問題，我試圖

myfun2 = function(x, ana, bob) x%>% 
    filter_(interp(~var %in% c("a", "b", "d"), var = as.name(ana))) %>% 
    arrange_(as.name(bob)) 

myfun2(foo, "id", "country")

來源

2014-10-01 16:03:25 aosmith

非常感謝您的更新。看來用'dplyr'編寫函數的方式現在比我想象的要簡單。這是很棒的東西。 – jazzurro 2014-10-01 16:10:05

嘿，我今天一直在玩你的代碼，我正在努力'排列'。 'arrange_（as.name（ana），as.name（bob））'工作正常。但是，我想爲bob添加「desc」。 'arrange_（as.name（ana），〜desc（as.name（bob））））'沒有錯誤，bur不起作用。 'arrange_（interp（as.name（ana），〜desc（as.name（bob））））'是一樣的。你有什麼想法？ – jazzurro 2014-10-03 15:22:44

我現在明白了。 ''''''''''我還是很困惑，但因爲'desc'是另外一個函數，所以你必須執行'interp（） '在'arrange_' – jazzurro 2014-10-03 15:42:34

功能中的主要dplyr函數

回答

相關問題