2014-03-26 70 views
0

我從xhtml文件中檢索內容。內容包含帶有src="/tmp/folder_name/file_name"的img標籤。我想將「/ tmp/folder_name/file_name」的src值替換爲「file_name」。下面的代碼是從xhtml獲取內容的方式。我嘗試了Nokogiri::HTML(section_content)。但結果內容不在xhtml中。如何將其轉換回XHTML或如何更換從內容src值不Nokogiri::HTML用rails中的字符串替換img src屬性值

section_content = section.export_xhtml_content file_path 
    doc = Nokogiri::HTML(section_content) 
    unless doc.css('div.image_content').blank? 
     doc.css('div.image_content img').each do |img| 
     newsrc = File.basename img[:src] 
     img.set_attribute('src', newsrc) 
     end 
    end 
    section_content = doc.to_s 

內容:

<?xml version="1.0" encoding="utf-8"?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml"> 
    <head> 
    <title>File 1: Chapter1</title> 
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> 
    <link href="stylesheet.css" type="text/css" rel="stylesheet"/> 
    <link href="page_styles.css" type="text/css" rel="stylesheet"/> 
    </head> 
    <body class="publitory"> 
    <h1 id="File_1_1">Chapter1</h1> 
    <h2 id="File_1_2">Content1</h2> 
    <h3 id="File_1_3">Content1.1</h3> 
    <p/> 
    <div style="width:25%; margin: 0 auto;" data-align="Middle" class="image_content"> 
     <img width="100%" src="/tmp/fog/development_publitory_bucket/uploads/user/b57030de-89ac-11e3-9cf2-bdfa8a998e1e/book/053bab68-b4b2-11e3-8ed6-996ec04a57ef/oeb_image/angel7eef59eb838ac763a43b936763dd184ec3324318.jpeg"/> 
     <div class="caption" style="clear:both;">Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content<br/></div> 
    </div> 
    <br/> 
    <p/> 
    <h3 id="File_1_4">Content1.2</h3> 
    <h2 id="File_1_5">Content2</h2> 
    <h2 id="File_1_6">Content3</h2> 
    <h2 id="File_1_7">Content4</h2> 
    </body> 
</html> 

通過引入nokogiri更換SRC值後,得到的內容是:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> 
<?xml version="1.0" encoding="utf-8"?> 
<html xmlns="http://www.w3.org/1999/xhtml"> 
    <head> 
    <title>File 1: Chapter1</title> 
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 
    <link href="stylesheet.css" type="text/css" rel="stylesheet"> 
    <link href="page_styles.css" type="text/css" rel="stylesheet"> 
    </head> 
    <body class="publitory"> 
    <h1 id="File_1_1">Chapter1</h1> 
    <h2 id="File_1_2">Content1</h2> 
    <h3 id="File_1_3">Content1.1</h3> 
    <p></p> 
    <div style="width:25%; margin: 0 auto;" data-align="Middle" class="image_content"> 
     <img width="100%" src="angel7eef59eb838ac763a43b936763dd184ec3324318.jpeg"> 
     <div class="caption" style="clear:both;">Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content<br> 
     </div> 
    </div> 
    <br> 
    <p></p> 
    <h3 id="File_1_4">Content1.2</h3> 
    <h2 id="File_1_5">Content2</h2> 
    <h2 id="File_1_6">Content3</h2> 
    <h2 id="File_1_7">Content4</h2> 
    </body> 
</html> 

結果內容應該是完美的xhtml。幫我解決這個問題。提前致謝。

+0

順便說一下,正確的方法來改善它的問題來編輯它們,而不是發佈一個新的問題stion在[早期版本]上展開(http://stackoverflow.com/q/22637833/1016716)。 –

回答

1

的基本步驟,你需要做的:

  1. 構造文檔,例如用Nokogiri::XML
  2. 找到目標節點與.xpath.css查詢
  3. 請在接口節點上的任何操作由Nokogiri::XML::Node提供
+0

是的,我喜歡這個: doc = Nokogiri :: HTML(section_content) \t除非doc.css('div.image_content')。blank? \t doc.css('div.image_content img')。each do | img | \t newsrc = File.basename IMG [:來源] \t img.set_attribute( 'src' 中,newsrc) \t年底 \t年底 \t section_content = doc.to_s – user2138489

+0

所以,有什麼問題? –

+0

編輯的問題。請檢查它。結果內容應該在xhtml中。 – user2138489