2010-07-05 69 views
3

試圖取消網頁與角度來說,Hpricot在Ruby 1.9的時候我得到以下編碼錯誤:編碼問題角度來說,Hpricot

Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8 

我可以做重現錯誤以下:

ska:~ sam$ rvm [email protected] 
ska:~ sam$ ruby -v 
ruby 1.9.2dev (2010-05-31 revision 28117) [x86_64-darwin10.4.0] 
ska:~ sam$ gem list 

*** LOCAL GEMS *** 

hpricot (0.8.2) 
rake (0.8.7) 
rdoc (2.5.8) 
ska:~ sam$ irb 
ruby-1.9.2-preview3 > require 'rubygems' 
=> false 
ruby-1.9.2-preview3 > require 'hpricot' 
=> true 
ruby-1.9.2-preview3 > require 'open-uri' 
=> true 

ruby-1.9.2-preview3 > page = Hpricot(open('http://www.imdb.com/title/tt0435761/')) 
=> #<Hpricot::Doc "\n" {doctype "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">"} "\n" {elem <html xmlns:og="http://opengraphprotocol.org/schema/" xmlns:fb="http://www.facebook.com/2008/fbml"> "\n" {elem <head> "\n" __TRUNCATED__ 


ruby-1.9.2-preview3 > page.search("//div[@class = 'info-content").collect { |f| f.inner_text }.join(', ') 

Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8 
     from /Users/sam/.rvm/gems/[email protected]/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `join' 
     from /Users/sam/.rvm/gems/[email protected]/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `inner_text' 
     from /Users/sam/.rvm/gems/[email protected]/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `block in inner_text' 
     from /Users/sam/.rvm/gems/[email protected]/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `map' 
     from /Users/sam/.rvm/gems/[email protected]/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `inner_text' 
     from /Users/sam/.rvm/gems/[email protected]/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `block in inner_text' 
     from /Users/sam/.rvm/gems/[email protected]/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `map' 
     from /Users/sam/.rvm/gems/[email protected]/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `inner_text' 
     from /Users/sam/.rvm/gems/[email protected]/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `block in inner_text' 
     from /Users/sam/.rvm/gems/[email protected]/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `map' 
     from /Users/sam/.rvm/gems/[email protected]/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `inner_text' 
     from /Users/sam/.rvm/gems/[email protected]/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `block in inner_text' 
     from /Users/sam/.rvm/gems/[email protected]/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `map' 
     from /Users/sam/.rvm/gems/[email protected]/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `inner_text' 
     from /Users/sam/.rvm/gems/[email protected]/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `block in inner_text' 
     from /Users/sam/.rvm/gems/[email protected]/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `map' 
     from /Users/sam/.rvm/gems/[email protected]/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `inner_text' 
     from /Users/sam/.rvm/gems/[email protected]/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `block in inner_text' 
     from /Users/sam/.rvm/gems/[email protected]/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `map' 
     from /Users/sam/.rvm/gems/rub[email protected]/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `inner_text' 
     from /Users/sam/.rvm/gems/[email protected]/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `block in inner_text' 
     from /Users/sam/.rvm/gems/[email protected]/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `map' 
     from /Users/sam/.rvm/gems/[email protected]/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `inner_text' 
     from (irb):5:in `block in irb_binding' 
     from (irb):5:in `collect' 
     from (irb):5 
     from /Users/sam/.rvm/rubies/ruby-1.9.2-preview3/bin/irb:17:in `<main>'ruby-1.9.2-preview3 > 
+0

我得到了它與引入nokogiri工作。 – Sam 2010-07-06 10:01:47

+0

就我個人而言,我推薦使用Hokrici的Nokogiri,因爲我的問題少得多。 – 2010-07-06 16:23:14

+0

Nokogiri是hpricot的'drop in'替代品,我建議使用它,而不是由_why維護hpricot。 – 2010-08-29 22:18:35

回答

0

嘗試從改變的XPath:

 
    page.search("//div[@class = 'info-content") 

到:

 
    page.search('//div[@class=info-content]') 

運行在IRB樣品給我:

 
ruby-1.9.1-p378 > page.search("//div[@class=info-content]").map{ |i| i.inner_text }[0] 
=> "Down 66% in popularity this week. See why on IMDbPro."
+0

你的權利是一個錯誤,但仍然會出現編碼錯誤。也許我應該試試1.9.1 – Sam 2010-07-06 09:40:42

+0

1.9.1已經改變來處理編碼了。我還沒有看到1.9.1處理文本比1.8.7好的情況,但這可能是因爲我最近沒有做任何轉換。我認爲1.9.1足夠穩定,足夠的模塊正在使用它,所以我將它用作我的默認值。 – 2010-07-06 16:26:42