2014-01-21 80 views
1

總編程新手在這裏。在紅寶石中,我將如何去除下列非字母和非數字字符串,然後將字符串拆分爲數組,然後將其拆分爲空格。正則表達式用於分割非字母和非數字字符

string = "Honey - a sweet, sticky, yellow fluid made by bees and other insects from nectar collected from flowers." 

進入這個

tokenized_string = ["Honey", "a", "sweet", "sticky", "yellow", "fluid", "made", "by", "bees", "and", "other", "insects", "from", "nectar", "collected", "from", "flowers"] 

任何幫助,將不勝感激!

+1

你應該告訴我們你嘗試,而不是問我們的代碼是什麼。 –

回答

2

我會使用:

string = "Honey - a sweet, sticky, yellow fluid made by bees and other insects from nectar collected from flowers." 
string.delete('^A-Za-z0-9 ').split 
# => ["Honey", 
#  "a", 
#  "sweet", 
#  "sticky", 
#  "yellow", 
#  "fluid", 
#  "made", 
#  "by", 
#  "bees", 
#  "and", 
#  "other", 
#  "insects", 
#  "from", 
#  "nectar", 
#  "collected", 
#  "from", 
#  "flowers"] 

如果你想去除一切,但字母數字,則不能使用\w字符類,因爲它被定義爲[A-Za-z0-9_],它允許_泄漏在或擠過去。下面是一個示例:

'foo_BAR12'[/\w+/] # => "foo_BAR12" 

匹配整個字符串,包括_

'foo_BAR12'[/[A-Za-z0-9]+/] # => "foo" 

那停在_,因爲該類[A-Za-z0-9]不包括它。

\w應被視爲變量名稱的匹配模式,而不是字母數字。如果你想爲字母數字字符類,看看POSIX \[\[:alnum:\]\]類:

'foo_BAR12'[/[[:alnum:]]+/] # => "foo" 
+0

爲什麼不直接'#scan' ..出於好奇.. –

+2

'scan'也可以,除非嵌入非字母數字:''foo_BAR12'.scan(/ [A-Za-z0-9] +/)#=> [「foo」,「BAR12」]'與 ''foo_BAR12'.delete('^ A-Za-z0-9').split#=> [「fooBAR12」]'。只有OP知道哪個是對的。 –

+0

感謝您的解釋... –

1

請執行belowe使用String#scan

string = "Honey - a sweet, sticky, yellow fluid made by bees and other insects from nectar collected from flowers." 
string.scan(/[a-zA-Z0-9]+/) 
# => ["Honey", 
#  "a", 
#  "sweet", 
#  "sticky", 
#  "yellow", 
#  "fluid", 
#  "made", 
#  "by", 
#  "bees", 
#  "and", 
#  "other", 
#  "insects", 
#  "from", 
#  "nectar", 
#  "collected", 
#  "from", 
#  "flowers"] 
+0

我建議你添加'.map&:strip' – mudasobwa

+0

再加上一個類似於'string =「的錯字,親愛的......一個甜蜜的......」或者在'string =「中用分號表示親愛的;一個甜美的......」'毀了你的正則表達式。 – mudasobwa

+0

感謝你們兩位! :) –

1

有很多的可能性,例如:

string.gsub(/\W/) { |m| m if m == ' ' }.split 

,或者更清楚:

string.gsub(/\W/) { |m| m if m.strip.empty? }.split 
1

很簡單。下面給你你想要的陣列,而你不必使用split

string.scan(/\w+/) 

與它玩上Rubular.com

+0

嗨@ArupRakshit你嘗試運行它兄弟?請解釋。 – mjnissim

+0

有什麼問題我曾經見過,當我試過它:) –

+1

'\ w'字符類不完全是字母數字。定義是'[A-Za-z0-9_]'。 '_'表示'\ w +'將匹配'foo123_BAR',這不是OP所要求的,因爲'_'既不是字母也不是數字。 –

相關問題