簡單正規快件問題

我有一個標題在博客上是這樣Main Idea, key term, key term, keyterm簡單正規快件問題

我想主要思路和關鍵術語有不同的字體大小。首先想到的是搜索第一個逗號和字符串的結尾，並用相同的東西替換該塊，但用span類標籤包圍，以使字體更小。

這裏的計劃：

HTML（前）

<a href="stupidreqexquestion">Main Idea, key term, key term, key term</a>

HTML（後）

<a href="stupidreqexquestion">Main Idea <span class="smaller_font">, key term, key term key term</span></a>

我使用Rails的，所以我打算將其添加爲輔助函數 - 例如：

幫手

def make_key_words_in_title_smaller(title) 
     #replace the keywords in the title with key words surrounded by span tags 
    end

視圖

<% @posts.each do |post |%> 
     <%= make_key_words_in_title_smaller(post.title)%> 
    <% end -%>

來源

2010-10-16 s84

你有什麼不包括在跨越最後一個關鍵項的原因是什麼？ – Skilldrick 2010-10-16 22:12:11

吶，這是一個錯字 – s84 2010-10-16 22:13:43

主要想法始終列在第一位？ – tinifni 2010-10-16 22:39:39

如果你不關心Main Idea部分是"Welcome home, Roxy Carmichael"，即用雙引號

>> t = "Main Idea, key term, key term, key term" 
=> "Main Idea, key term, key term, key term" 

>> t.gsub(/(.*?)(,.*)/, '\1 <span class="smaller_font">\2</span>') 
=> "Main Idea <span class=\"smaller_font\">, key term, key term, key term</span>"

來源

2010-10-16 22:56:59

作品非常簡單，謝謝！ – s84 2010-10-17 02:51:20

如果字符串樸實，內comman（即，沒有標籤）這些工作很好：

data = 'Main Idea, key term, key term, key term' 

# example #1 
/^(.+?,)(.+)/.match(data).captures.each_slice(2).map { |a,b| a << %Q{<span class="smaller_font">#{ b }</span>}}.first 
# => "Main Idea, <span class=\"smaller_font\">key term, key term, key term</span>" 

# example #2 
data =~ /^(.+?,)(.+)/ 
$1 << %Q{<span class="smaller_font">#{ $2 }</span>} 
# => "Main Idea, <span class=\"smaller_font\">key term, key term, key term</span>"

如果字符串有標籤，那麼使用正則表達式來處理HTML或XML是不鼓勵的，因爲它很容易破壞。針對您控制的HTML的極其微不足道的用途非常安全，但如果內容或格式發生更改，那麼正則表達式可能會破壞您的代碼。

HTML解析器是通常推薦的解決方案，因爲如果內容或格式更改，它們將繼續工作。這是我會用Nokogiri做的。我特意詳細解釋了事情的原委：

require 'nokogiri' 

# build a sample document 
html = '<a href="stupidreqexquestion">Main Idea, key term, key term, key term</a>' 
doc = Nokogiri::HTML(html) 

puts doc.to_s, '' 

# find the link 
a_tag = doc.at_css('a[href=stupidreqexquestion]') 

# break down the tag content 
a_text = a_tag.content 
main_idea, key_terms = a_text.split(/,\s+/, 2) # => ["Main Idea", "key term, key term, key term"] 
a_tag.content = main_idea 

# create a new node 
span = Nokogiri::XML::Node.new('span', doc) 
span['class'] = 'smaller_font' 
span.content = key_terms 

puts span.to_s, '' 

# add it to the old node 
a_tag.add_child(span) 

puts doc.to_s 
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> 
# >> <html><body><a href="stupidreqexquestion">Main Idea, key term, key term, key term</a></body></html> 
# >> 
# >> <span class="smaller_font">key term, key term, key term</span> 
# >> 
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> 
# >> <html><body><a href="stupidreqexquestion">Main Idea<span class="smaller_font">key term, key term, key term</span></a></body></html>

在輸出上面可以看到引入nokogiri如何構建樣本文檔，添加跨度，並將得到的文檔。

它可以簡化爲：

require 'nokogiri' 

doc = Nokogiri::HTML('<a href="stupidreqexquestion">Main Idea, key term, key term, key term</a>') 

a_tag = doc.at_css('a[href=stupidreqexquestion]') 
main_idea, key_terms = a_tag.content.split(/,\s+/, 2) 
a_tag.content = main_idea 

a_tag.add_child("<span class='smaller_font'>#{ key_terms }</span>") 

puts doc.to_s 
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> 
# >> <html><body><a href="stupidreqexquestion">Main Idea<span class="smaller_font">key term, key term, key term</span></a></body></html>

來源

2010-10-16 23:49:45

你的寫作太棒了！我希望我會更清楚。我的意思是css標籤是這樣的，我希望在應用正則表達式或nokogiri之後它是這樣，所以你不能用它來獲得關鍵詞，你必須使用第一個逗號和字符串的結尾作爲標記。非常好的帖子非常感謝！ – s84 2010-10-17 02:53:38

我不確定你的意思。可以在不使用XPath或CSS的情況下查找文檔中的各個部分，但搜索將會不太準確。通常情況下，我們會尋找某種常用的「地標」進行導航，即使這意味着要找到它，然後向上，向下或橫向移動到達目的地。如果你只需要調整一個簡單的字符串並添加''標籤，那麼這是一個非常簡單的問題，我希望Rails開發人員能夠解決問題。 – 2010-10-17 21:42:33

該地標將是第一個逗號和字符串的結尾，所以我不知道Nokogiri會如何發現它。我已經使用nokogiri進行屏幕抓取，例如創建新聞提要，但它需要某種xml或html類來由AFAIK解析。 – s84 2010-10-18 17:46:17

簡單正規快件問題

回答

相關問題