我目前正在使用Ruby中的Markov chain text generator應用程序,它接收文本的正文(「語料庫」),然後基於該正文生成新文本。我現在需要解決的問題是編寫一個Regexp,它將返回包含我指定的單詞數的數組。我想在這裏做的是抓取一定數量的單詞(由用戶指定),但在整個字符串中多次。如何使用RegExp獲取指定數量的帶有特殊字符的單詞?
去掉另一個我見過的應用程序,我正在使用類似/(([.,?"();\-!':—^\w]+){#{depth}})/
的內容,其中#{depth}
插值了我一次需要的單詞數。這應該一次抓住兩個單詞,同時允許一個特殊字符的子集,這就是讓我感覺到的那一塊。所以總的問題是這樣的:如何動態地指定我想要的單詞數量(用空格分隔),同時還允許這些單詞中的一系列特殊字符?
這是我目前有:
# Regex
@match_regex = /(([.,?"();\-!':—^\w]+){2})/
s = input.scan(@match_regex).to_a
puts s.inspect
# Input
Within weeks they planned a meeting. She sent him poetry along with her itinerary,
having worked in a business meeting to excuse the opportunity. He prepared flowers
and a banner of welcome on his hearth.
# Output - seems to be grabbing last word again for some reason
[["Within weeks ", "weeks "], ["they planned ", "planned "], ["a meeting. ", "meeting. "],
["She sent ", "sent "], ["him poetry ", "poetry "], ["along with ", "with "],
["her itinerary, ", "itinerary, "], ["having worked ", "worked "], ["in a ", "a "],
["business meeting ", "meeting "], ["to excuse ", "excuse "],
["the opportunity. ", "opportunity. "], ["He prepared ", "prepared "], ["flowers and ", "and "],
["a banner ", "banner "], ["of welcome ", "welcome "], ["on his ", "his "]]
# Desired output. I'm not picky if it has trailing spaces or not as I can always trim that
["Within weeks", "they planned", "a meeting.", "She sent", "him poetry", "along with",
"her itinerary," "having worked", "in a", "business meeting", "to excuse", "the opportunity.",
"He prepared", "flowers and", "a banner", "of welcome", "on his"]
任何幫助將不勝感激。謝謝!
處理'-'的另一種方法是確保它是方括號中的第一個或最後一個字符;這樣,它將表示文字短劃線而不是範圍,即使沒有逃脫。 – Amadan 2014-09-22 00:48:24