dplyr變異的列的範圍內橫行最大

我可以使用下面的返回最大2列dplyr變異的列的範圍內橫行最大

newiris<-iris %>% 
rowwise() %>% 
mutate(mak=max(Sepal.Width,Petal.Length))

的我想要做的就是找到在一系列列是最大的，所以我沒有命名每個像這樣的

newiris<-iris %>% 
rowwise() %>% 
mutate(mak=max(Sepal.Width:Petal.Length))

任何想法？

來源

2015-10-06 user2502836

相反的rowwise()，這可以用pmax

iris %>% 
     mutate(mak=pmax(Sepal.Width,Petal.Length, Petal.Width))

做可能是我們可以使用interp從library(lazyeval)如果我們想引用存儲在一個vector的列名。

library(lazyeval) 
nm1 <- names(iris)[2:4] 
iris %>% 
    mutate_(mak= interp(~pmax(v1), v1= as.name(nm1)))

來源

2015-10-06 20:03:58 akrun

pmax的好主意。任何想法如何通過引用書擋找到3列的最大值？例如：Sepal.Width通過Petal.Width？ – user2502836

@ user2502836更新了帖子。請檢查是否有幫助。 – akrun

對於選擇某些列，而無需使用dplyr打字時，整個名字，我更喜歡從subset功能select參數。這樣

可以得到想要的結果：

iris %>% subset(select = 2:4) %>% mutate(mak = do.call(pmax, (.))) %>% 
    select(mak) %>% cbind(iris)

來源

2015-10-07 08:02:17 inscaven

我覺得我們可以只選'（2：4）'而不是'子集（select = 2：4）'。 –

好像@ akrun的答案只解決時，你可以在所有的變量的名稱輸入的情況下，不管是直接使用mutate與mutate(pmax_value=pmax(var1, var2))或當使用mutate_和interp通過mutate_(interp(~pmax(v1, v2), v1=as.name(var1), v2=as.name(var2))進行延遲評估。

如果您想使用冒號語法Sepal.Length:Petal.Width或者您碰巧有一個帶有列名稱的向量，我可以看到兩種方法來執行此操作。

第一個更優雅。您可以整理數據並在分組時對數值取最大值：

data(iris) 
library(dplyr) 
library(tidyr) 

iris_id = iris %>% mutate(id=1:nrow(.)) 
iris_id %>% 
    gather('attribute', 'value', Sepal.Length:Petal.Width) %>% 
    group_by(id) %>% 
    summarize(max_attribute=max(value)) %>% 
    right_join(iris_id, by='id') %>% 
    head(3) 
## # A tibble: 3 × 7 
##  id max_attribute Sepal.Length Sepal.Width Petal.Length Petal.Width Species 
## <int>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <fctr> 
## 1  1   5.1   5.1   3.5   1.4   0.2 setosa 
## 2  2   4.9   4.9   3.0   1.4   0.2 setosa 
## 3  3   4.7   4.7   3.2   1.3   0.2 setosa

更難的方法是使用插值公式。如果你有一個字符向量，其變量的名字會被最大化，或者如果你的表格太高/寬了以至於不能整理，這很好。

# Make a character vector of the names of the columns we want to take the 
# maximum over 
target_columns = iris %>% select(-Species) %>% names 
## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" 

# Make a vector of dummy variables that will take the place of the real 
# column names inside the interpolated formula 
dummy_vars = sapply(1:length(target_columns), function(i) sprintf('x%i', i)) 
## [1] "x1" "x2" "x3" "x4" 

# Paste those variables together to make the argument of the pmax in the 
# interpolated formula 
dummy_vars_string = paste0(dummy_vars, collapse=',') 
## [1] "x1,x2,x3,x4" 

# Make a named list that maps the dummy variable names (e.g., x1) to the 
# real variable names (e.g., Sepal.Length) 
dummy_vars_list = lapply(target_columns, as.name) %>% setNames(dummy_vars) 
## $x1 
## Sepal.Length 
## 
## $x2 
## Sepal.Width 
## 
## $x3 
## Petal.Length 
## 
## $x4 
## Petal.Width 

# Make a pmax formula using the dummy variables 
max_formula = as.formula(paste0(c('~pmax(', dummy_vars_string, ')'), collapse='')) 
## ~pmax(x1, x2, x3, x4) 

# Interpolate the formula using the named variables 
library(lazyeval) 
iris %>% 
    mutate_(max_attribute=interp(max_formula, .values=dummy_vars_list)) %>% 
    head(3) 
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species max_attribute 
## 1   5.1   3.5   1.4   0.2 setosa   5.1 
## 2   4.9   3.0   1.4   0.2 setosa   4.9 
## 3   4.7   3.2   1.3   0.2 setosa   4.7

來源

2017-03-31 15:34:22

dplyr變異的列的範圍內橫行最大

回答

相關問題