2017-06-16 84 views
8

我遇到與perl的表達\\L\\1困難在R-dev的非常特別的情況下(2017年6月6日和2017年6月16日r72796版本):R 3.5.0支持正則表達式\ L嗎?

bib <- readLines("https://raw.githubusercontent.com/HughParsonage/TeXCheckR/master/tests/testthat/lint_bib_in.bib", encoding = "UTF-8") 

leading_spaces <- 2 

is_field <- grepl("=", bib, fixed = TRUE) 
field_width <- nchar(trimws(gsub("[=].*$", "", bib, perl = TRUE))) 

widest_field <- max(field_width[is_field]) 

out <- bib 

# Vectorized gsub: 
for (line in seq_along(bib)){ 
    # Replace every field line with 
    # two spaces + field name + spaces required for widest field + space 
    if (is_field[line]){ 
    spaces_req <- widest_field - field_width[line] 
    out[line] <- 
     gsub("^\\s*(\\w+)\\s*[=]\\s*\\{", 
      paste0(paste0(rep(" ", leading_spaces), collapse = ""), 
        "\\L\\1", 
        paste0(rep(" ", spaces_req), collapse = ""), 
        " = {"), 
      bib[line], 
      perl = TRUE) 
    } 
} 

# Add commas: 
out[is_field] <- gsub("\\}$", "\\},", out[is_field], perl = TRUE) 

out[9] 
#> R-dev " author" 
#> R 3.4.0 " author  = {Tony Wood and Amélie Hunter and Michael O'Toole and Prasana Venkataraman and Lucy Carter}," 

要重現,它是必要的:

  • readLines從一個文件,並指定編碼。 (使用dput將不會重現)
  • 在perl正則表達式中使用\\L\\U
  • 要使用的字符
  • 的向量要具有需要UTF-8(E在天使愛美麗在上述),該向量的元素

這是中的R 3.5.0的變化,或者有在這種情況下,我一直在誤用\\L

+0

瞧,你已經被警告:[*它可能包含的錯誤,所以要小心,如果你使用它。*](https://cran.r-project.org /bin/windows/base/rdevel.html)。 –

+0

我無法構建代碼段 - 什麼是'leading_spaces'? –

+0

這個特定的錯誤是在一個包的R CMD檢查中導致錯誤。對不起,我編輯過。 – Hugh

回答

9

顯然有一些意想不到的行爲。

當提及\1,它的工作原理輸出:

[1] " author  = {Tony Wood and Amélie Hunter and Michael O'Toole and Prasana Venkataraman and Lucy Carter}," 

然而,每當\U\L使用具有\1,第二反向引用被除去。

  • "\\U\\1"[1] " AUTHOR"
  • "\\U\\1\\E\\2"[1] " AUTHOR"

gsubfn溶液仍然有效(在此,與toupper()爲例):

library(gsubfn) 
bib <- readLines("https://raw.githubusercontent.com/HughParsonage/TeXCheckR/master/tests/testthat/lint_bib_in.bib", encoding = "UTF-8") 
leading_spaces <- 2 
is_field <- grepl("=", bib, fixed = TRUE) 
field_width <- nchar(trimws(gsub("[=].*$", "", bib, perl = TRUE))) 
widest_field <- max(field_width[is_field]) 
out <- bib 

# Vectorized gsub: 
for (line in seq_along(bib)){ 
    # Replace every field line with 
    # two spaces + field name + spaces required for widest field + space 
    if (is_field[line]){ 
    spaces_req <- widest_field - field_width[line] 
    out[line] <- 
     gsubfn("^\\s*(\\w+)\\s*=\\s*\\{", 
      function(y) paste0(
        paste0(rep(" ", leading_spaces), collapse = ""), 
        toupper(y), 
        paste0(rep(" ", spaces_req), collapse = ""), 
        " = {" 
      ), 
      bib[line], engine="R" 
    ) 
    } 
} 
# Add commas: 
out[is_field] <- gsub("\\}$", "},", out[is_field], perl = TRUE) 

out[9] 

輸出:

[1] " AUTHOR  = {Tony Wood and Amélie Hunter and Michael O'Toole and Prasana Venkataraman and Lucy Carter}," 

我sessionInfo細節:

> sessionInfo() 
R Under development (unstable) (2017-06-19 r72808) 
Platform: i386-w64-mingw32/i386 (32-bit) 
Running under: Windows 7 x64 (build 7601) Service Pack 1 

Matrix products: default 

locale: 
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252 
[3] LC_MONETARY=English_United States.1252 
[4] LC_NUMERIC=C       
[5] LC_TIME=English_United States.1252  

attached base packages: 
[1] stats  graphics grDevices utils  datasets methods base  

other attached packages: 
[1] gsubfn_0.6-6 proto_1.0.0 

loaded via a namespace (and not attached): 
[1] compiler_3.5.0 tools_3.5.0 tcltk_3.5.0