2011-10-11 37 views

回答

2

您可以使用String.split和正則表達式模式作爲參數。 像這樣:

"Hello_World I am Learning,Ruby".split /[ _,.!?]/ 
=> ["Hello", "World", "I", "am", "Learning", "Ruby"] 
1
ruby-1.9.2-p290 :022 > str = "Hello_World I am Learning,Ruby" 
ruby-1.9.2-p290 :023 > str.split(/\s|,|_/) 
=> ["Hello", "World", "I", "am", "Learning", "Ruby"] 
0

雖然上面的例子中工作,我想將字符串分割的話拆就不會被認爲是任何一種文字的一部分字符的時候它可能會更好。要做到這一點,我這樣做:

str = "Hello_World I am Learning,Ruby" 
str.split(/[^a-zA-Z]/).reject(&:empty?).compact 

本聲明如下:

  1. 拆分由不在字母字符的字符串
  2. 然後拒絕任何爲空字符串
  3. ,並移除陣列

然後將處理的話大部分組合的所有空值。上面的例子要求你列出你想匹配的所有字符。指定不認爲是單詞的一部分的字符要容易得多。

1

String#Scan似乎是一個合適的方法完成這個任務

irb(main):018:0> "Hello_World I am Learning,Ruby".scan(/[a-z]+/i) 
=> ["Hello", "World", "I", "am", "Learning", "Ruby"] 

,或者您可以使用內置的匹配\w

irb(main):020:0> "Hello_World I am Learning,Ruby".scan(/\w+/) 
=> ["Hello_World", "I", "am", "Learning", "Ruby"] 
4

你可以使用\ W任何非單詞字符:

"Hello_World I am Learning,Ruby".split /[\W_]/ 
=> ["Hello", "World", "I", "am", "Learning", "Ruby"] 

"Hello_World I am Learning, Ruby".split /[\W_]+/ 
=> ["Hello", "World", "I", "am", "Learning", "Ruby"] 
0

只是爲了好玩,1.9的Unicode識別版本(或1.8與Oniguruma):

>> "This_µstring has words.and thing's".split(/[^\p{Word}']|\p{Connector_Punctuation}/) 
=> ["This", "µstring", "has", "words", "and", "thing's"] 

或許:

>> "This_µstring has words.and thing's".split(/[^\p{Word}']|_/) 
=> ["This", "µstring", "has", "words", "and", "thing's"] 

真正的問題是確定哪些字符序列構成在這種情況下一個 「字」。您可能想要查看Oniguruma docs以瞭解支持的字符屬性,Wikipedia has some notes on the properties