2012-01-14 28 views
4

我發現了一些使用post_connect_hookpre_connect_hook的解決方案,但似乎它們不起作用。我正在使用最新的機械化版本(2.1)。新版本中沒有[:response]字段,我不知道在哪個版本中獲得它們。如何獲得機械自動將身體轉換爲UTF8?

是否有可能使機械化返回,而不必手動將其使用iconv轉換UTF8編碼版本,?

+0

iconv將在Ruby 1.9版本中被棄用。看看String.froce_encoding – phoet 2012-01-17 16:42:30

+0

機械化更改編碼回ASCII-8BIT,即使我處處設置編碼,force_encoding,在編碼機械化鉤子編碼的字符串。一切都一樣。我想我需要破解頁眉和html的元字符集。 – 2012-01-17 21:03:41

+0

感謝您的提示。最後我已經用一種新的「編碼」方法進行了解釋。 – 2012-01-17 21:04:44

回答

3

由於機械化2.0的pre_connect_hooks()post_connect_hooks()參數發生了變化。

參見Mechanize文檔:

pre_connect_hooks()

鉤檢索響應之前調用的列表。掛鉤通過代理,URI,響應和響應主體進行調用。

post_connect_hooks()

鉤的列表中檢索的響應之後調用。掛鉤通過代理,URI,響應和響應主體進行調用。

現在您不能更改內部響應體的值,因爲參數不是數組。因此,下一個最好的辦法就是用自己更換內部解析器:

class MyParser 
    def self.parse(thing, url = nil, encoding = nil, options = Nokogiri::XML::ParseOptions::DEFAULT_HTML, &block) 
    # insert your conversion code here. For example: 
    # thing = NKF.nkf("-wm0X", thing).sub(/Shift_JIS/,"utf-8") # you need to rewrite content charset if it exists. 
    Nokogiri::HTML::Document.parse(thing, url, encoding, options, &block) 
    end 
end 

agent = Mechanize.new 
agent.html_parser = MyParser 
page = agent.get('http://somewhere.com/') 
... 
0

怎麼是這樣的:

class Mechanize 
    alias_method :original_get, :get 
    def get *args 
     doc = original_get *args 
     doc.encoding = 'utf-8' 
     doc 
    end 
end 
+0

不起作用。仍然' \ r \ n \ r \ n \ xCD \ xE5 \ xE4' – <span class="text-secondary"> <small> <span>2012-01-21 19:59:23</span> </small> </span> </p> </div> </div> </div> <div itemprop="comment" class="post-comment"> <div class="row"> <div class="col-lg-1"><span class="text-secondary">+0</span></div> <div class="col-lg-11"> <p class="commenttext">https://gist.github.com/1653836 - 工作,但我不喜歡這個想法。 – <span class="text-secondary"> <small> <span>2012-01-21 20:13:43</span> </small> </span> </p> </div> </div> </div> <div itemprop="comment" class="post-comment"> <div class="row"> <div class="col-lg-1"><span class="text-secondary">+0</span></div> <div class="col-lg-11"> <p class="commenttext">我同意,最好編寫一個調用@ agent.get的get方法並進行轉換。雖然你的問題似乎要求猴子補丁。 – <span class="text-secondary"> <small> <a rel="noopener" target="_blank" href="https://stackoverflow.com/users/966023/">pguardiario</a></span> <span>2012-01-22 01:01:39</span> </small> </span> </p> </div> </div> </div> </div> </div> </article> <article class="board-top-1 padding-top-10"> <div class="post-col vote-info"> <span class="count">0<i class="fa fa-thumbs-up"></i></span> </div> <div class="post-offset"> <div class="answer fmt"> <p>在腳本中,只需輸入:<code class="prettyprint-override">page.encoding = 'utf-8'</code></p> <p>但是,根據您的情況,您可以替代需要輸入反向(機械化網站的編碼工作)。爲此,打開Firefox,打開您希望Mechanize使用的網站,在菜單欄中選擇工具,然後打開頁面信息。從那裏確定頁面編碼的內容。</p> <p>使用該信息,您可以輸入頁面編碼的內容(如<code class="prettyprint-override">page.encoding = 'windows-1252'</code>)。</p> </div> <div class="post-info"> <div class="post-meta row"> <p class="text-secondary col-lg-6"> <span class="source"> <a rel="noopener" target="_blank" href="https://stackoverflow.com/q/17582268">來源</a> </span> </p> <p class="text-secondary col-lg-6"> <span class="float-right date"> <span>2013-07-10 22:49:02</span> <a rel="noopener" target="_blank" href="https://stackoverflow.com/users/1960231/">CodeBiker</a></span> </p> <p class="col-12"></p> <p class="col-12"></p></div> </div> </div> </article> <article class="board-top-1 padding-top-10"> <div class="post-col vote-info"> <span class="count">1<i class="fa fa-thumbs-up"></i></span> </div> <div class="post-offset"> <div class="answer fmt"> <p>我發現非常有效的解決方案:</p> <pre><code class="prettyprint-override">class HtmlParser def self.parse(body, url, encoding) body.encode!('UTF-8', encoding, invalid: :replace, undef: :replace, replace: '') Nokogiri::HTML::Document.parse(body, url, 'UTF-8') end end Mechanize.new.tap do |web| web.html_parser = HtmlParser end </code></pre> <p>任何問題都還​​沒有找到。</p> </div> <div class="post-info"> <div class="post-meta row"> <p class="text-secondary col-lg-6"> <span class="source"> <a rel="noopener" target="_blank" href="https://stackoverflow.com/q/20666246">來源</a> </span> </p> <p class="text-secondary col-lg-6"> <span class="float-right date"> <span>2013-12-18 18:59:44</span> </p> <p class="col-12"></p> <p class="col-12"></p></div> </div> </div> </article> <div> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-6208739752673518" data-ad-slot="1038284119" data-ad-format="auto" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> </div> <div class="clearfix"> </div> <div class="relative-box"> <div class="relative">相關問題</div> <ul class="relative_list"> <li> 1. <a href="http://hk.uwenku.com/question/p-hkbstnzg-nz.html" target="_blank" title="如何將符號字體轉換爲標準utf8 HTML實體"> 如何將符號字體轉換爲標準utf8 HTML實體 </a> </li> <li> 2. <a href="http://hk.uwenku.com/question/p-cpxmlgdn-na.html" target="_blank" title="如何將字符串轉換爲UTF8?"> 如何將字符串轉換爲UTF8? </a> </li> <li> 3. <a href="http://hk.uwenku.com/question/p-djlcfdej-bdm.html" target="_blank" title="SOLR + Mysql:如何將utf8轉換爲latin1"> SOLR + Mysql:如何將utf8轉換爲latin1 </a> </li> <li> 4. <a href="http://hk.uwenku.com/question/p-tplxijbv-do.html" target="_blank" title="如何將mysql latin1轉換爲utf8"> 如何將mysql latin1轉換爲utf8 </a> </li> <li> 5. <a href="http://hk.uwenku.com/question/p-mkzshyvy-bma.html" target="_blank" title="java:如何將文件轉換爲utf8"> java:如何將文件轉換爲utf8 </a> </li> <li> 6. <a href="http://hk.uwenku.com/question/p-gzvkfgwf-bam.html" target="_blank" title="如何將php數組轉換爲utf8?"> 如何將php數組轉換爲utf8? </a> </li> <li> 7. <a href="http://hk.uwenku.com/question/p-dqdrhztu-vy.html" target="_blank" title="如何將utf8轉換爲iso-8859-7?"> 如何將utf8轉換爲iso-8859-7? </a> </li> <li> 8. <a href="http://hk.uwenku.com/question/p-dtntphel-bck.html" target="_blank" title="如何將ISO8859-15轉換爲UTF8?"> 如何將ISO8859-15轉換爲UTF8? </a> </li> <li> 9. <a href="http://hk.uwenku.com/question/p-cyngvyat-bcn.html" target="_blank" title="將Unicode轉換爲UTF8"> 將Unicode轉換爲UTF8 </a> </li> <li> 10. <a href="http://hk.uwenku.com/question/p-acnwxsjr-beh.html" target="_blank" title="Python將latin1轉換爲UTF8"> Python將latin1轉換爲UTF8 </a> </li> <div> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-6208739752673518" data-ad-slot="4606349252"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <li> 11. <a href="http://hk.uwenku.com/question/p-vpkcivsw-zq.html" target="_blank" title="將UTF8轉換爲ANSI?"> 將UTF8轉換爲ANSI? </a> </li> <li> 12. <a href="http://hk.uwenku.com/question/p-rzhhrfry-xx.html" target="_blank" title="將utf8轉換爲1251"> 將utf8轉換爲1251 </a> </li> <li> 13. <a href="http://hk.uwenku.com/question/p-efwpnfzm-bmb.html" target="_blank" title="將UTF8轉換爲Windows-1252"> 將UTF8轉換爲Windows-1252 </a> </li> <li> 14. <a href="http://hk.uwenku.com/question/p-gdwtpapl-sh.html" target="_blank" title="將Utf8轉換爲Unicode"> 將Utf8轉換爲Unicode </a> </li> <li> 15. <a href="http://hk.uwenku.com/question/p-ohqptnuj-zp.html" target="_blank" title="將latin1轉換爲UTF8"> 將latin1轉換爲UTF8 </a> </li> <li> 16. <a href="http://hk.uwenku.com/question/p-khvsqjsg-bbd.html" target="_blank" title="將utf8代碼點字符串轉換爲utf8 <U+0161>轉換爲utf8"> 將utf8代碼點字符串轉換爲utf8 <U+0161>轉換爲utf8 </a> </li> <li> 17. <a href="http://hk.uwenku.com/question/p-abulsqpu-ge.html" target="_blank" title="自動將所選數據從拉丁文2轉換爲utf8"> 自動將所選數據從拉丁文2轉換爲utf8 </a> </li> <li> 18. <a href="http://hk.uwenku.com/question/p-hgcghkcm-gx.html" target="_blank" title="如何自動將underscore_identifiers轉換爲CamelCaseIdentifiers?"> 如何自動將underscore_identifiers轉換爲CamelCaseIdentifiers? </a> </li> <li> 19. <a href="http://hk.uwenku.com/question/p-rhywdqpd-bgd.html" target="_blank" title="如何自動將PDF轉換爲HTML?"> 如何自動將PDF轉換爲HTML? </a> </li> <li> 20. <a href="http://hk.uwenku.com/question/p-crzourqe-bcd.html" target="_blank" title="如何ANSI轉換爲UTF8在蜂巢"> 如何ANSI轉換爲UTF8在蜂巢 </a> </li> <li> 21. <a href="http://hk.uwenku.com/question/p-fldiovvq-bbs.html" target="_blank" title="如何將圖像轉換爲紋身?"> 如何將圖像轉換爲紋身? </a> </li> <li> 22. <a href="http://hk.uwenku.com/question/p-fuvncnle-yg.html" target="_blank" title="將awt自動轉換爲"> 將awt自動轉換爲 </a> </li> <li> 23. <a href="http://hk.uwenku.com/question/p-blxpihhl-nx.html" target="_blank" title="如何將Nokogiri聲明轉換爲機械化以進行屏幕抓取?"> 如何將Nokogiri聲明轉換爲機械化以進行屏幕抓取? </a> </li> <li> 24. <a href="http://hk.uwenku.com/question/p-gykavbrx-beb.html" target="_blank" title="如何將機械化中的相關鏈接轉換爲絕對鏈接?"> 如何將機械化中的相關鏈接轉換爲絕對鏈接? </a> </li> <li> 25. <a href="http://hk.uwenku.com/question/p-coqgldeh-so.html" target="_blank" title="將手機轉換爲html實體"> 將手機轉換爲html實體 </a> </li> <li> 26. <a href="http://hk.uwenku.com/question/p-yfckcdhu-yu.html" target="_blank" title="將簡體中文GB 2312文本字符轉換爲UTF8"> 將簡體中文GB 2312文本字符轉換爲UTF8 </a> </li> <li> 27. <a href="http://hk.uwenku.com/question/p-bpbguxnx-kp.html" target="_blank" title="將數據庫字體從MS SQL轉換爲mysql utf8?"> 將數據庫字體從MS SQL轉換爲mysql utf8? </a> </li> <li> 28. <a href="http://hk.uwenku.com/question/p-roamihuc-rk.html" target="_blank" title="使用Ruby將ANSI轉換爲UTF8"> 使用Ruby將ANSI轉換爲UTF8 </a> </li> <li> 29. <a href="http://hk.uwenku.com/question/p-fiambtxv-uo.html" target="_blank" title="使用PHP將latin1_swedish_ci轉換爲utf8"> 使用PHP將latin1_swedish_ci轉換爲utf8 </a> </li> <li> 30. <a href="http://hk.uwenku.com/question/p-zhuceadm-bbg.html" target="_blank" title="將utf8mb4字符轉換爲utf8在php"> 將utf8mb4字符轉換爲utf8在php </a> </li> </ul> </div> <div> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block" data-ad-format="autorelaxed" data-ad-client="ca-pub-6208739752673518" data-ad-slot="1575177025"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <div class="padding-top-10"></div> </div> </div> <script type="text/javascript" src="http://img.uwenku.com/uwenku/script/side.js?t=1644592048261"></script> <script type="text/javascript" src="http://img.uwenku.com/uwenku/plugin/highlight/highlight.pack.js"></script> <link href="http://img.uwenku.com/uwenku/plugin/highlight/styles/docco.css" media="screen" rel="stylesheet" type="text/css" /> <script type="text/javascript"> $('pre').each(function(i, e) { hljs.highlightBlock(e, "<span class='indent'> </span>", false) }); </script> <div class="col-lg-3 col-md-4 col-sm-5"> <div id="rightTop"> <div class="row"> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-6208739752673518" data-ad-slot="5415218910" data-ad-format="auto" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <div class="row sidebar panel panel-default"> <div class="panel-heading font-bold"> 最新問題 </div> <div class="m-b-sm m-t-sm clearfix"> <ul class="side_article_list"> <li class="side_article_list_item"> 1. <a href="http://hk.uwenku.com/question/p-qrzpyrgr-pp.html" target="_blank" title="選擇所有沒有h標籤的鏈接"> 選擇所有沒有h標籤的鏈接 </a> </li> <li class="side_article_list_item"> 2. <a href="http://hk.uwenku.com/question/p-kopluyxm-ou.html" target="_blank" title="配置文件給我值隨意類"> 配置文件給我值隨意類 </a> </li> <li class="side_article_list_item"> 3. <a href="http://hk.uwenku.com/question/p-tunifaze-pd.html" target="_blank" title="nginx用URL重寫部分url? mark"> nginx用URL重寫部分url? mark </a> </li> <li class="side_article_list_item"> 4. <a href="http://hk.uwenku.com/question/p-kocwvevm-nz.html" target="_blank" title="Python Web Scrape Cycle選項卡"> Python Web Scrape Cycle選項卡 </a> </li> <li class="side_article_list_item"> 5. <a href="http://hk.uwenku.com/question/p-ahggiyrx-ok.html" target="_blank" title="安全的字符串輸出陣營生態系統"> 安全的字符串輸出陣營生態系統 </a> </li> <li class="side_article_list_item"> 6. <a href="http://hk.uwenku.com/question/p-rnxnvwes-ne.html" target="_blank" title="再次按下Drawtoggle後關閉抽屜"> 再次按下Drawtoggle後關閉抽屜 </a> </li> <li class="side_article_list_item"> 7. <a href="http://hk.uwenku.com/question/p-gojblsun-nq.html" target="_blank" title="在上傳laravel之前的圖像裁剪"> 在上傳laravel之前的圖像裁剪 </a> </li> <li class="side_article_list_item"> 8. <a href="http://hk.uwenku.com/question/p-ubufhmjd-mm.html" target="_blank" title="嘗試在Roku通道中斷開連接會引發錯誤"> 嘗試在Roku通道中斷開連接會引發錯誤 </a> </li> <li class="side_article_list_item"> 9. <a href="http://hk.uwenku.com/question/p-nbcltjgs-ma.html" target="_blank" title="如何發佈到外部IP?"> 如何發佈到外部IP? </a> </li> <li class="side_article_list_item"> 10. <a href="http://hk.uwenku.com/question/p-zjdfagbc-cy.html" target="_blank" title="生產基於優先級列表"> 生產基於優先級列表 </a> </li> </ul> </div> </div> </div> <p class="article-nav-bar"></p> <div class="row sidebar article-nav"> <div class="row box_white visible-sm visible-md visible-lg margin-zero"> <div class="top"> <h3 class="title"><i class="glyphicon glyphicon-th-list"></i> 相關問題</h3> </div> <div class="article-relative-content"> <ul class="side_article_list"> <li class="side_article_list_item"> 1. <a href="http://hk.uwenku.com/question/p-hkbstnzg-nz.html" target="_blank" title="如何將符號字體轉換爲標準utf8 HTML實體"> 如何將符號字體轉換爲標準utf8 HTML實體 </a> </li> <li class="side_article_list_item"> 2. <a href="http://hk.uwenku.com/question/p-cpxmlgdn-na.html" target="_blank" title="如何將字符串轉換爲UTF8?"> 如何將字符串轉換爲UTF8? </a> </li> <li class="side_article_list_item"> 3. <a href="http://hk.uwenku.com/question/p-djlcfdej-bdm.html" target="_blank" title="SOLR + Mysql:如何將utf8轉換爲latin1"> SOLR + Mysql:如何將utf8轉換爲latin1 </a> </li> <li class="side_article_list_item"> 4. <a href="http://hk.uwenku.com/question/p-tplxijbv-do.html" target="_blank" title="如何將mysql latin1轉換爲utf8"> 如何將mysql latin1轉換爲utf8 </a> </li> <li class="side_article_list_item"> 5. <a href="http://hk.uwenku.com/question/p-mkzshyvy-bma.html" target="_blank" title="java:如何將文件轉換爲utf8"> java:如何將文件轉換爲utf8 </a> </li> <li class="side_article_list_item"> 6. <a href="http://hk.uwenku.com/question/p-gzvkfgwf-bam.html" target="_blank" title="如何將php數組轉換爲utf8?"> 如何將php數組轉換爲utf8? </a> </li> <li class="side_article_list_item"> 7. <a href="http://hk.uwenku.com/question/p-dqdrhztu-vy.html" target="_blank" title="如何將utf8轉換爲iso-8859-7?"> 如何將utf8轉換爲iso-8859-7? </a> </li> <li class="side_article_list_item"> 8. <a href="http://hk.uwenku.com/question/p-dtntphel-bck.html" target="_blank" title="如何將ISO8859-15轉換爲UTF8?"> 如何將ISO8859-15轉換爲UTF8? </a> </li> <li class="side_article_list_item"> 9. <a href="http://hk.uwenku.com/question/p-cyngvyat-bcn.html" target="_blank" title="將Unicode轉換爲UTF8"> 將Unicode轉換爲UTF8 </a> </li> <li class="side_article_list_item"> 10. <a href="http://hk.uwenku.com/question/p-acnwxsjr-beh.html" target="_blank" title="Python將latin1轉換爲UTF8"> Python將latin1轉換爲UTF8 </a> </li> </ul> </div> </div> </div> </div> </div> </div> </div><!-- wrap end--> <!-- footer --> <footer id="footer"> <div class="bg-simple lt"> <div class="container"> <div class="row padder-v m-t"> <div class="col-xs-8"> <ul class="list-inline"> <li><a href="http://hk.uwenku.com/contact">聯系我們</a></li> <li>© 2020 HK.UWENKU.COM</li> <li><a target="_blank" href="https://beian.miit.gov.cn/">沪ICP备13005482号-4</a></li> <li><script type="text/javascript" src="https://v1.cnzz.com/z_stat.php?id=1280101193&web_id=1280101193"></script></li> <li><a href="http://www.uwenku.com/" target="_blank" title="优文库">简体中文</a></li> <li><a href="http://hk.uwenku.com/" target="_blank" title="優文庫">繁體中文</a></li> <li><a href="http://ru.uwenku.com/" target="_blank" title="поле вопросов и ответов">Русский</a></li> <li><a href="http://de.uwenku.com/" target="_blank" title="Frage - und - antwort - Park">Deutsch</a></li> <li><a href="http://es.uwenku.com/" target="_blank" title="Preguntas y respuestas">Español</a></li> <li><a href="http://hi.uwenku.com/" target="_blank" title="कार्यक्रम प्रश्न और उत्तर पार्क">हिन्दी</a></li> <li><a href="http://it.uwenku.com/" target="_blank" title="IL Programma di chiedere Park">Italiano</a></li> <li><a href="http://ja.uwenku.com/" target="_blank" title="プログラム問答園区">日本語</a></li> <li><a href="http://ko.uwenku.com/" target="_blank" title="프로그램 문답 단지">한국어</a></li> <li><a href="http://pl.uwenku.com/" target="_blank" title="program o park">Polski</a></li> <li><a href="http://tr.uwenku.com/" target="_blank" title="Program soru ve cevap parkı">Türkçe</a></li> <li><a href="http://vi.uwenku.com/" target="_blank" title="Đáp ứng viên">Tiếng Việt</a></li> <li><a href="http://fr.uwenku.com/" target="_blank" title="Programme interrogation Park">Française</a></li> </ul> </div> </div> </div> </div> </div> </footer> <!-- / footer --> <script> var _hmt = _hmt || []; (function() { var hm = document.createElement("script"); hm.src = "https://hm.baidu.com/hm.js?f78a970f17b19a79fc477a3378096f29"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(hm, s); })(); </script> </body> </html>