優雅的方式在Ruby中提取信息的正則表達式

我的問題很簡單，這裏是行：優雅的方式在Ruby中提取信息的正則表達式

<title><* page.title *></title>

我想要得到的「page.title」的一部分。我可以這樣做，通過使用這些：

replacement = line.match(/\<\* .* \*\>/) 
replacement_contain = replacement.to_s.match(/ .* /).to_s.strip

有沒有任何捷徑或更好的方法來做到這一點？

來源

2015-01-10 user2543457

如果你不熟悉'nokogiri'，你應該花時間學習它。我被告知，這很簡單。 –

@Cary，感謝您留下我的nokogiri，不知道爲什麼我沒有使用它。 – user2543457

" <title><* page.title *></title> "[/(?<=\*).*(?=\*)/].strip #=> "page.title"

來源

2015-01-10 18:57:54 sawa

很好的答案，謝謝 – user2543457

的一種方式是使用一個捕獲組：

str = "<title><* page.title *></title>" 

str[/\*\s+(.*)\s+\*/,1] 
    #=> "page.title"

正則表達式表示以匹配：

\* : one asterisk, followed by 
\s+ : one or more spaces, followed by capture group #1 
(.*) : which matches all characters until it reaches the last 
\s+ : string of one or more spaces in the line that is followed by 
\* : an asterisk

\1是捕獲組＃1，其被提取的內容並由String#[]返回。

來源

2015-01-10 19:00:27

require 'nokogiri' 
require 'open-uri' 

html = Nokogiri.HTML open('https://stackoverflow.com/questions/27879967/elegant-way-to-extarct-information-ruby-regex') 

puts html.css('title').text 
# => "Elegant way to extarct information ruby regex - Stack Overflow"

的回答「我如何解析HTML與正則表達式」是"don't, unless you know it will conform to strict XML rules."

例如，@澤的和@卡里的解決方案，同時沒關係，如果你知道什麼樣的內容您的HTML將包含，失敗，如果你在您網頁的其他任何地方都有*>，這是完全有效的HTML。改用Nokogiri這樣的HTML解析器（如上所示）。

來源

2015-01-10 19:06:24 Doorknob

良好的建議，我見過很多人都看到了。 –

優雅的方式在Ruby中提取信息的正則表達式

回答

相關問題