2011-06-03 57 views

回答

5

此代碼會給你整個文檔的純無格式的文本:

require 'mechanize' 
require 'nokogiri' 

rational = Mechanize.new { |agent| 
    agent.user_agent_alias = 'Windows Mozilla' 
} 

document = Nokogiri::HTML(rational.get(ARGV[0]).content) 

#This will give you very dirty result 
#results = document.inner_text 

#My suggestion is to extract text from some specific element 
results = document.css("#content .my-element-with-some-contents").inner_text 
+0

很好地工作。謝謝。我認爲我可以在機械化對象上使用Nokogiri方法.... – Radek 2011-06-03 07:11:01

+0

機械化基於Nokogiri,所以我認爲你是對的! – 2011-06-03 09:12:26

+2

不需要解析響應,你可以像寫'rational.get(link); rational.page.at( '/ HTML /體/ H1')。text' – taro 2011-06-03 14:38:13