我有一串亂七八糟的字符串,如下所示。用於提取複雜字符串的R正則表達式
string <- c("GRP-14994/", "GRP-7056 GRP-7036/", "grp-24263(24263)/IRGC 28588", "GRP-15916 /IRGC-42176",
"GRP-614-250B/", "(GRP 11432)/IRGC-14570", "Tourn", "GRPP256", "Purse", "GRP-14956 Origin:", "GRP 10537", "GRP-10096 Origin: ",
"SGRP123", "GRP1234", "AC-30009 (GRPHANA)/", "AC-3060 GRP 536-143/Old AC", "RGRPfaa/23", "/-",
"MGR:7251/", "1216-GR-567/", "X:1 Well KGRPh", "WabGRPvea(II)", "HR33(BGRP)", "Tensor",
"Wald", "grp12312")
我試圖提取所有的實例,其中GRP後面跟數字,可能由空格或「 - 」分隔。
我目前的嘗試給了我以下結果。
gsub("(.*)(\\b)(GRP)(-|\\s|)(\\d+)(\\/|\\b)(.*)","\\3\\5", string, ignore.case = T)
[1] "GRP14994" "GRP7056" "grp24263" "GRP15916"
[5] "GRP614" "GRP11432" "Tourn" "GRPP256"
[9] "Purse" "GRP14956" "GRP10537" "GRP10096"
[13] "SGRP123" "GRP1234" "AC-30009 (GRPHANA)/" "GRP536"
[17] "RGRPfaa/23" "/-" "MGR:7251/" "1216-GR-567/"
[21] "X:1 Well KGRPh" "WabGRPvea(II)" "HR33(BGRP)" "Tensor"
[25] "Wald" "grp12312"
但所需的輸出RIS
out <- c("GRP14994", "GRP7056 GRP7036", "grp24263", "GRP15916", "GRP614250",
"GRP11432", "", "", "", "GRP14956", "GRP10537", "GRP10096", "",
"GRP1234", "", "GRP536143", "", "", "", "", "", "", "", "", "",
"grp12312")
out
[1] "GRP14994" "GRP7056 GRP7036" "grp24263" "GRP15916" "GRP614250" "GRP11432"
[7] "" "" "" "GRP14956" "GRP10537" "GRP10096"
[13] "" "GRP1234" "" "GRP536143" "" ""
[19] "" "" "" "" "" ""
[25] "" "grp12312"
如何修改正則表達式來獲得所需的結果?在你的模式中發現
您所提供的預期輸出看起來不正確的。 'GRP614'不會是'GRP614250'嗎?和'GRPP256'?它有兩個** P ** s – hwnd
如果這是您的輸入,並且您確定輸入的數據,您可以通過使用^ without()開始您的正則表達式來強制字符串以給定的GRP字符串開頭。*),以便它匹配所有以GRP開頭的字符串 – LMG
GRPP256' ... – hwnd