案例1:標題的列表給出
假設
titles = ["Dr.", "Prof.", "Mr.", "Mrs.", "Ms.", "Her Worship", "The Grand Poobah"]
R =/
(?: # begin non-capture group
#{Regexp.union(titles)}
# "or" all the titles
\s* # match >= 0 spaces
)* # end non-capture group and perform >= 0 times
/x # free-spacing regex definition mode
#=>/
# (?: # begin non-capture group
# (?-mix:Dr\.|Prof\.|Mr\.|Mrs\.|Ms\.|Her\ Worship|The\ Grand\ Poobah)
# # "or" all the titles
# \s* # match >= 0 spaces
# )* # end non-capture group and perform >= 0 times
# /x
def extract_titles(str)
t = str[R] || ''
[str[t.size..-1], t.rstrip]
end
["Prof. Dr. John J. Doe, Jr.", "Dr. Prin. Gloria Smith", "The Grand Poobah Dr. No",
"Gloria Smith", "Cher, Ph.D."].each { |s| p extract_titles s }
# ["John J. Doe, Jr.", "Prof. Dr."]
# ["Prin. Gloria Smith", "Dr."]
# ["No", "The Grand Poobah Dr."]
# ["Gloria Smith", ""]
# ["Cher, Ph.D.", ""]
如果沒有標題,因爲在過去的兩個例子,str[R] => nil
,所以(str[R] || "").rstrip #=> "".rstrip #=> ""
。
請參閱文檔以瞭解類方法Regexp::union以瞭解它的使用方式。
案例2:沒有職稱
的列表中的以下假定所有標題都用大寫字母開頭一個字,後面跟着一個或多個小寫字母,後跟一個句點。如果這是不正確的,下面的正則表達式可以相應地改變。
這種情況和前一種情況唯一的區別是正則表達式發生了變化。
R =/
\A # match beginning of string
(?: # start a non-capture group
[A-Z] # match a capital letter
[a-z]+ # match > 0 lower-case letters
\.\s* # match a period followed by >= 0 spaces
)* # end non-capture group and execute >= 0 times
/x # free-spacing regex definition mode
["Prof. Dr. John J. Doe, Jr.", "Dr.Prin.Gloria Smith",
"Gloria Smith", "Cher, Ph.D."].each { |s| p extract_titles(s) }
# ["John J. Doe, Jr.", "Prof. Dr."]
# ["Gloria Smith", "Dr. Prin."]
# ["Gloria Smith", ""]
# ["Cher, Ph.D.", ""]
注:我簡化了我的原始答案。
不要只用「etc」模糊它。清楚說明你所關心的前綴是什麼。 – sawa
@sawa有沒有n。的前綴,他們都不能被提及,所以考慮是一個數組。 – Datt
如何刪除不能提及的內容? – sawa