使用可變

我有一個數據幀的一些列名的數字處理中的R dplyr數值列名：使用可變

> names(spreadResults) 
[1] "PupilMatchingRefAnonymous" "GENDER"     "URN"      
[4] "KS2Eng"     "KS2Mat"     "EVERFSM_6"     
[7] "0001"      "0003"      "0009"      
[10] "0015"

我想在每一個都是數字列名的運行報告：

for(DiscID in colnames(spreadResults[7:length(spreadResults)])) 
{ 
    #DiscIDcol <- match(DiscID,names(spreadResults)) 
    colID <- as.name(DiscID) 
    print(colID) 
    print(DiscID) 

    #get data into format suitable for creating tables 
    temp <- spreadResults %>% select(GENDER, EVERFSM_6, colID) %>% 
     filter_(!is.na(colID)) %>% 
     group_by_(GENDER, EVERFSM_6, colID) %>% 
     summarise(n = n()) %>% 
     ungroup() 
}

，但我得到：

`0001` 
[1] "0001" 
Error: All select() inputs must resolve to integer column positions. 
The following do not: 
* colID

但是，如果我用反勾``和明確命名列

temp <- spreadResults %>% select(GENDER, EVERFSM_6, `0001`)

這很好。有沒有辦法用變量來處理列名？我知道你可以在select（）中使用匹配（DiscID），但匹配（...）在group_by，spread等中不起作用。

我正在處理的數據幀的前五行（）

structure(list(
PupilMatchingRefAnonymous = c(12345L, 12346L, 12347L, 12348L, 12349L), 
GENDER = structure(c(2L, 2L, 1L, 1L, 1L), .Label = c("F", "M"), class = "factor"), 
URN = c(123456L, 123456L, 123456L, 123456L, 123456L), 
KS2Eng = c(4L, 3L, 4L, 5L, 3L), 
KS2Mat = c(4L, 5L, 4L, 4L, 3L), 
EVERFSM_6 = c(1L, 1L, 0L, 0L, 1L), 
`0001` = c(66, 44, NA_real_, 55, 66), 
`0003` = c(22, NA_real_, NA_real_, NA_real_, NA_real_), 
`0009` = c(NA_real_, 66, NA_real_, NA_real_, NA_real_), 
`0015` = c(33, NA_real_, 55, NA_real_, NA_real_)), 
.Names = c("PupilMatchingRefAnonymous", "GENDER", "URN", "KS2Eng", "KS2Mat", "EVERFSM_6", 
"0001", "0003", "0009", "0015"), 
row.names = c(NA, 5L), class = "data.frame")

所需的輸出：

GENDER EVERFSM_6 0001  n 
    (fctr)  (int) (dbl) (int) 
1  F   0 55  1 
2  F   1 66  1 
3  M   1 44  1 
4  M   1 66  1

來源

2016-03-28 pluke

最簡單的事情是可能改變的列名有一個主角。 – Thomas

這不是數字（儘管這通常會是一種痛苦）;這是關於非標準的評估。 'dplyr'函數默認使用未引用的列名，所以如果您想要傳遞其他東西，則需要使用以下劃線（'select_'）結尾的SE版本。 – alistaire

正在運行spreadResults < - 重命名（spreadResults，「n0001」='0001'），然後再次運行代碼，仍會在n0001上引發相同的錯誤。我可以重命名但沒有區別 – pluke

要在任意列名編程dplyr，則需要使用以_結尾的函數的標準評估版本，因此您的變量不會被NSE版本解釋爲列名稱。（更多有關NSE，見Hadley's book。）

語法看起來應該是這樣的：你需要

library(dplyr) 

cols <- c('Sepal.Length', 'Sepal.Width') 

iris %>% select_(.dots = cols) %>% head() 
# Sepal.Length Sepal.Width 
# 1   5.1   3.5 
# 2   4.9   3.0 
# 3   4.7   3.2 
# 4   4.6   3.1 
# 5   5.0   3.6 
# 6   5.4   3.9

如果你有固定的列名，以及，將它們插入到你的性格矢量/列表或引用他們與''，""，quote，或~：

iris %>% select_(~Species, .dots = cols) %>% head() 
# Species Sepal.Length Sepal.Width 
# 1 setosa   5.1   3.5 
# 2 setosa   4.9   3.0 
# 3 setosa   4.7   3.2 
# 4 setosa   4.6   3.1 
# 5 setosa   5.0   3.6 
# 6 setosa   5.4   3.9

來源

2016-03-28 17:38:03 alistaire

已更新問題輸出輸出。你的答案非常接近，但當我超越選擇範圍時，我遇到了問題。當DiscID ==「0001」，正在運行 cols <-c（「GENDER」，「EVERFSM_6」，DiscID） spreadResults％>％select _（。dots = cols）％>％filter_（！is.na（DiscID））拉第一列：PupilMatchingRefAnonymous – pluke

或者。設置colID < - as.name（DiscID） cols <-c（「GENDER」，「EVERFSM_6」，colID） temp < - spreadResults％>％select _（。dots = cols）％>％ filter _（！is。 na（c（colID）））選擇工作，但filter_doesn't不開火，也不會抱怨 – pluke

SE「過濾器」是一種痛苦，據我所見。你可以使用'interp' /'substitute'或者'paste'：'df％>％select _（。dots = cols）％>％filter_（paste（'！is.na（'，DiscID，'）'））' – alistaire

的select幫助建議使用的one_of。它的工作原理在下面的例子：

df <- data.frame("a" = 1:3 , "b" = 3:5) 
names(df)[1] <- "243234" # rename, to a numeric string 

var <- names(df)[1] 

library(dplyr) 

df %>% select(one_of(var))

你也可以看到，這個問題是不是在你的數字名稱，但在路上你叫選擇：

var <- names(df)[2] # use the column named "b" 
df %>% select(one_of(var)) 
    b 
1 3 
2 4 
3 5 
df %>% select(var) 
Error: All select() inputs must resolve to integer column positions. 
The following do not: 
* var

來源

2016-03-28 17:24:15

我應該讓我的問題更清楚，matches（）也適用於select，但不適用於group_by，spread等。似乎有類似的這些其他函數中的one_of（）存在問題 – pluke

好的，那麼我想一個可重複的例子會有幫助。 –

已經更新了輸出 – pluke

回答

相關問題