2016-06-10 49 views
5

我想在自定義函數中調用tidyr::gather(),向其傳遞一對將用於重命名keyvalue列的字符變量。例如將變量傳遞給tidyr的集合以重命名鍵/值列?

myFunc <- function(mydata, key.col, val.col) { 
    new.data <- tidyr::gather(data = mydata, key = key.col, value = val.col) 
    return(new.data)  
} 

但是,這不符合要求。

temp.data <- data.frame(day.1 = c(20, 22, 23), day.2 = c(32, 22, 45), day.3 = c(17, 9, 33)) 

# Call my custom function, renaming the key and value columns 
# "day" and "temp", respectively 
long.data <- myFunc(mydata = temp.data, key.col = "day", val.col = "temp") 

# Columns have *not* been renamed as desired 
head(long.data) 
    key.col val.col 
1 day.1  20 
2 day.1  22 
3 day.1  23 
4 day.2  32 
5 day.2  22 
6 day.2  45 

所需的輸出:

head(long.data) 
    day temp 
1 day.1 20 
2 day.1 22 
3 day.1 23 
4 day.2 32 
5 day.2 22 
6 day.2 45 

我的理解是,gather()使用裸變量名稱對於大多數參數(因爲它在這個例子中,使用"key.col"作爲列名作爲反對存儲在key.col)。我嘗試了很多方法在gather()調用中傳遞值,但大多數返回錯誤。例如,gather()呼叫的myFunc返回Error: Invalid column specification(忽略,爲了說明的目的,該value參數,其具有相同的特性)中的這些三種變體:

gather(data = mydata, key = as.character(key.col) value = val.col) 

gather(data = mydata, key = as.name(key.col) value = val.col) 

gather(data = mydata, key = as.name(as.character(key.col)) value = val.col) 

作爲一種變通方法,我剛重命名呼叫以下的列到gather()

colnames(long.data)[colnames(long.data) == "key"] <- "day" 

但是考慮gather()的本意是功能重命名的鍵/值列,我怎麼能做到這一點在自定義函數中調用gather()

+0

閱讀「收集」,並注意「另見」部分。然後谷歌搜索適當的函數名稱可能會導致你[這](http://stackoverflow.com/q/26429582/324364)。 – joran

回答

1

爲了把它放在你必須使用gather_()像這樣的功能。

myFunc <- function(mydata, key.col, val.col, gather.cols) { 
    new.data <- gather_(data = mydata, 
         key_col = key.col, 
         value_col = val.col, 
         gather_cols = colnames(mydata)[gather.cols]) 
    return(new.data)  
} 

temp.data <- data.frame(day.1 = c(20, 22, 23), day.2 = c(32, 22, 45), 
day.3 = c(17, 9, 33)) 
temp.data 


    day.1 day.2 day.3 
1 20 32 17 
2 22 22  9 
3 23 45 33 

# Call my custom function, renaming the key and value columns 
# "day" and "temp", respectively 

long.data <- myFunc(mydata = temp.data, key.col = "day", val.col = 
"temp", gather.cols = 1:3) 
# Columns *have* been renamed as desired 
head(long.data) 

    day temp 
1 day.1 20 
2 day.1 22 
3 day.1 23 
4 day.2 32 
5 day.2 22 
6 day.2 45 

如前所述,主要區別在於gather_你必須指定要與gather_cols參數收拾列。

+1

很好的解釋。我沒有意識到韋翰用這樣的力量灌輸了謙虛的下劃線。 – Jeff

1

大多數(如果不是全部的話)Haldey使用裸變量名作爲參數的函數(例如dplyr的函數)具有function_版本,它使用常規評估並且「適合用於編程」。所以,你需要什麼應該僅僅是:

myFunc <- function(mydata, key.col, val.col) { 
    tidyr::gather_(data = mydata, key_col = key.col, 
       value_col = val.col, gather_cols = colnames(mydata))   
} 

唯一的「抓」在這裏,它是強制性指定gather_cols,使用gather或可作爲...分開進行時,這是沒有必要的。

然後:

> myFunc2(mydata = temp.data, key.col = "day", val.col = "temp") 
    day temp 
1 day.1 20 
2 day.1 22 
3 day.1 23 
4 day.2 32 
5 day.2 22 
6 day.2 45 
7 day.3 17 
8 day.3 9 
9 day.3 33 
+0

好解釋;包含'gather_cols'很周到。 – Jeff