R - 突變字符串處理 - 沒有得到我期望的行爲

我試圖在dplyr中使用突變來處理字符串，我沒有得到我想要的輸出（請參見下文）而不是操作行按照行，mutate正在採取第一個元素，並向下填充它。我想知道是否有人能夠幫助我理解我做錯了什麼，以及如何調整此代碼以正常工作。R - 突變字符串處理 - 沒有得到我期望的行爲

short.idfun = function(longid) 
{ 
    x  = strsplit(longid,"_") 
    y  = x[[1]] 
    study = substr(y[1],8,nchar(y[1])) 
    subj = y[length(y)] 
    subj = substr(subj,regexpr("[^0]",subj),nchar(subj)) #remove leading zeros 
    shortid= paste(study,subj,sep="-") 
    return(shortid) 
} 

data = data.frame(test=c("1234567Andy_003_003003","1234567Beth_004_003004","1234567Char_003_003005"),stringsAsFactors=FALSE) 
data= mutate(data,shortid=short.idfun(test)) 
print(data) 

#### Below is my output 
#      test shortid 
#1 1234567Andy_003_003003 Andy-3003 
#2 1234567Beth_004_003004 Andy-3003 
#3 1234567Char_003_003005 Andy-3003 

#### This is the behavior I was hoping for 
#      test shortid 
#1 1234567Andy_003_003003 Andy-3003 
#2 1234567Beth_004_003004 Beth-3004 
#3 1234567Char_003_003005 Char-3005

來源

2016-01-06 Andy Stein

另一種方法是使用rowwise()：

data %>% 
    rowwise() %>% 
    mutate(shortid = short.idfun(test))

其中給出：

#Source: local data frame [3 x 2] 
#Groups: <by row> 
# 
#     test shortid 
#     (chr)  (chr) 
#1 1234567Andy_003_003003 Andy-3003 
#2 1234567Beth_004_003004 Beth-3004 
#3 1234567Char_003_003005 Char-3005

來源

2016-01-07 14:21:48

謝謝，這是一個不錯的方法！ –

@AndyStein很高興幫助！ –

問題是，你的功能需要一點幫助矢量化。你可以通過vapply來運行它，以獲得你想要的結果。

data = data.frame(test=c("1234567Andy_003_003003","1234567Beth_004_003004","1234567Char_003_003005"),stringsAsFactors=FALSE) 
data= mutate(data, 
      shortid=vapply(test, short.idfun, character(1))) 
print(data)

明白你們爲什麼把你做的結果，我們可以看看在小的功能的前幾行。

longid = data$test 
(x <- strsplit(longid, "_")) 
[[1]] 
[1] "1234567Andy" "003"   "003003"  

[[2]] 
[1] "1234567Beth" "004"   "003004"  

[[3]] 
[1] "1234567Char" "003"   "003005"

目前一切看起來不錯，但現在您定義了y。

(y  = x[[1]]) 

[1] "1234567Andy" "003"   "003003"

通過調用x[[1]]，你x在x拉出僅第一個元素的x，不是第一載體，而不是每個向量的第一個元素。您還可以通過定義y <= vapply(x, function(v) v[1], character(1))來修改您的功能，並跳過mutate中的vapply。無論哪種方式應該工作。

來源

2016-01-06 22:09:46 Benjamin

非常感謝！我想這是一種情況，我應該只使用lappy/vapply而不是mutate。我對什麼時候使用什麼有點困惑，但我想你所說的是mutate僅適用於矢量化函數。 –

在這種情況下使用'mutate'沒有任何問題。並且沒有理由不能在mutate中使用apply函數（我一直這麼做）。我認爲在這種情況下引發你的是你期望'strsplit'返回類似矩陣的東西。當你更熟悉什麼函數返回時，你會在代碼出現錯誤之前開始解決這些類型的問題。只需要一點時間和經驗。 – Benjamin

R - 突變字符串處理 - 沒有得到我期望的行爲

回答

相關問題