如果字符串樸實,內comman(即,沒有標籤)這些工作很好:
data = 'Main Idea, key term, key term, key term'
# example #1
/^(.+?,)(.+)/.match(data).captures.each_slice(2).map { |a,b| a << %Q{<span class="smaller_font">#{ b }</span>}}.first
# => "Main Idea, <span class=\"smaller_font\">key term, key term, key term</span>"
# example #2
data =~ /^(.+?,)(.+)/
$1 << %Q{<span class="smaller_font">#{ $2 }</span>}
# => "Main Idea, <span class=\"smaller_font\">key term, key term, key term</span>"
如果字符串有標籤,那麼使用正則表達式來處理HTML或XML是不鼓勵的,因爲它很容易破壞。針對您控制的HTML的極其微不足道的用途非常安全,但如果內容或格式發生更改,那麼正則表達式可能會破壞您的代碼。
HTML解析器是通常推薦的解決方案,因爲如果內容或格式更改,它們將繼續工作。這是我會用Nokogiri做的。我特意詳細解釋了事情的原委:
require 'nokogiri'
# build a sample document
html = '<a href="stupidreqexquestion">Main Idea, key term, key term, key term</a>'
doc = Nokogiri::HTML(html)
puts doc.to_s, ''
# find the link
a_tag = doc.at_css('a[href=stupidreqexquestion]')
# break down the tag content
a_text = a_tag.content
main_idea, key_terms = a_text.split(/,\s+/, 2) # => ["Main Idea", "key term, key term, key term"]
a_tag.content = main_idea
# create a new node
span = Nokogiri::XML::Node.new('span', doc)
span['class'] = 'smaller_font'
span.content = key_terms
puts span.to_s, ''
# add it to the old node
a_tag.add_child(span)
puts doc.to_s
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body><a href="stupidreqexquestion">Main Idea, key term, key term, key term</a></body></html>
# >>
# >> <span class="smaller_font">key term, key term, key term</span>
# >>
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body><a href="stupidreqexquestion">Main Idea<span class="smaller_font">key term, key term, key term</span></a></body></html>
在輸出上面可以看到引入nokogiri如何構建樣本文檔,添加跨度,並將得到的文檔。
它可以簡化爲:
require 'nokogiri'
doc = Nokogiri::HTML('<a href="stupidreqexquestion">Main Idea, key term, key term, key term</a>')
a_tag = doc.at_css('a[href=stupidreqexquestion]')
main_idea, key_terms = a_tag.content.split(/,\s+/, 2)
a_tag.content = main_idea
a_tag.add_child("<span class='smaller_font'>#{ key_terms }</span>")
puts doc.to_s
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body><a href="stupidreqexquestion">Main Idea<span class="smaller_font">key term, key term, key term</span></a></body></html>
你有什麼不包括在跨越最後一個關鍵項的原因是什麼? – Skilldrick 2010-10-16 22:12:11
吶,這是一個錯字 – s84 2010-10-16 22:13:43
主要想法始終列在第一位? – tinifni 2010-10-16 22:39:39