我正在嘗試使用正則表達式編寫程序來清理一些數據。假設我的房間名稱中有一個字母和一個數字。在最終的輸出中,我需要使用「完整字符串(不包括字母&數字)+字母+數字」模式輸出房間名稱,如下例所示。但是,到目前爲止我寫的正則表達式,我得到了非常糟糕的結果,這些結果在我的消息的底部。出於某種原因,它會在某些行上放置字母和字符,即使輸入數據中可能沒有。謝謝。正則表達式重新排序字段中的字符串
編輯:我編輯了輸入數據。我想概括一下代碼以獲取任意數量的字符串,而不僅僅是單個單詞「ROOM」。
# the pattern should be "the full string (excluding letter & number) + letter + number". For example:
ATLANTA ROOM
ATLANTA ROOM 3
NEW YORK ROOM A 2
ROOM A 4
THE BIG AWESOME ROOM B
ROOM B 4
GEORGETOWN ROOM B 2
NEW YORK ROOM C 2
NEW YORK ROOM C
LOS ANGELES ROOM E 2
# program to clean with regular expressions. there could be multiple spaces between words
dd <- c("ATLANTA ROOM ",
" ATLANTA ROOM 3",
"NEW YORK A ROOM 2",
"4 ROOM A",
"THE BIG AWESOME ROOM B",
" ROOM 4 B",
"GEORGETOWN B 2 ROOM ",
" C NEW YORK ROOM 2",
"NEW YORK ROOM C",
"LOS ANGELES ROOM 2 E")
m_char_num <- regexpr("(\\<A|B|C|D|E|1|2|3|4\\>)", dd)
m_char <- regexpr("(\\<A|B|C|D|E\\>)", dd)
m_num <- regexpr("(\\<1|2|3|4\\>)", dd)
(dd2 <- paste(gsub("(+)", " ",
gsub("(^ +)|(+$)", "",
gsub("(\\<A|B|C|D|E|1|2|3|4\\>)", "", dd))),
regmatches(dd, m_char), regmatches(dd, m_num), sep = " "))
# actual output from the program
"TLANTA ROOMA3",
"TLANTA ROOMA2",
"NW YORK ROOMA4",
"ROOMA4",
"TH IG WSOM ROOME2",
"ROOMB2",
"GORGTOWN ROOMB2",
"NW YORK ROOMC3",
"NW YORK ROOMC2",
"LOS NGLS ROOMA4"
貌似recy保守規則正在絆倒你。 「長度(dd)」與「長度(regmatches(dd,m_char))」不同,因爲regmatches省略了找不到匹配的位置。 –