2012-09-07 55 views
1

可能重複:
simple parsing in ruby在沒有Nokogiri的Ruby中分析curl?

我想驗證一個標題的網站和一些試驗和錯誤之後,我發現,這可以在Ruby中進行使用引入nokogiri和rest-client

require 'nokogiri' 
require 'rest-client' 

page = Nokogiri::HTML(RestClient.get("http:/#{user.username}.domain.com/")) 
simian = page.at_css("title").text 
    if simian == "Welcome to" 
     puts "default monkey" 
    else 
    puts "website updated"  
    end 

不幸的是,對於大量的網站來說,這似乎並不總是奏效,因爲它返回 RESTClient實現:: InternalServerError在/管理/用戶/列表 500內部服務器錯誤

我想知道是否有任何選項,通過簡單地使用 mycurl =%×來達到同樣的(捲曲http://.......)。這將是一個通過解析標題並且不使用任何寶石來使用它的有效方法,或者可以直接使用nokogiri來使用curl選項嗎? 謝謝

回答

4

在閱讀完您的問題之後,您是否確定是否設置了這兩個寶石,下面是另一種可能更簡單的方法。

require 'open-uri' 

url="http://google.com" 
source = open(url).read 
source[/<title>(.*)<\/title>, 1] 
+1

語法錯誤源[/ <\/title>,1(*) – <span class="text-secondary"> <small> <a rel="noopener" target="_blank" href="https://stackoverflow.com/users/1318313/">devnull</a></span> <span></span> </small> </span> </p> </div> </div> </div> </div> </div> </article> <div> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-6208739752673518" data-ad-slot="1038284119" data-ad-format="auto" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <article class="board-top-1 padding-top-10"> <div class="post-col vote-info"> <span class="count">1<i class="fa fa-thumbs-up"></i></span> </div> <div class="post-offset"> <div class="answer fmt"> <p>這有兩個部分。一個是提取頁面,另一個是解析。對於抓取,您不需要<code class="prettyprint-override">rest-client</code>寶石,當從標準庫中得到<code class="prettyprint-override">open-uri</code>。 Nokogiri會進行解析,這不太可能是您的問題。試試這個:</p> <pre><code class="prettyprint-override">require 'open-uri' require 'nokogiri' page = Nokogiri::HTML(open('http://example.com/')) puts page.at('title').text </code></pre> </div> <div class="post-info"> <div class="post-meta row"> <p class="text-secondary col-lg-6"> <span class="source"> <a rel="noopener" target="_blank" href="https://stackoverflow.com/q/12325158">來源</a> </span> </p> <p class="text-secondary col-lg-6"> <span class="float-right date"> <span>2012-09-07 20:47:02</span> </p> <p class="col-12"></p> <p class="col-12"></p></div> </div> <!-- comments --> <div class="comments"> <div itemprop="comment" class="post-comment"> <div class="row"> <div class="col-lg-1"><span class="text-secondary">+0</span></div> <div class="col-lg-11"> <p class="commenttext">你好這個作品和許多感謝 - 唯一的問題是,如果,例如,我試圖打開一個頁面,在?代理的索引沒有索引,並且服務器顯示內部服務器錯誤腳本在/ admin/users/ 處因OpenURI :: HTTPError而死亡500內部服務器錯誤/ruby-1.9.3-p125/lib/ruby/1.9.1 /open-uri.rb:in open_http raise OpenURI :: HTTPError.new(io.status.join(''),io)... 任何有關如何跳過這些頁面的提示將不勝感激! – <span class="text-secondary"> <small> <a rel="noopener" target="_blank" href="https://stackoverflow.com/users/1318313/">devnull</a></span> <span></span> </small> </span> </p> </div> </div> </div> <div itemprop="comment" class="post-comment"> <div class="row"> <div class="col-lg-1"><span class="text-secondary">+0</span></div> <div class="col-lg-11"> <p class="commenttext">無法正常工作https://gist.github.com/3670329 URI無法在子域中處理簡單的_因此我必須使用curl – <span class="text-secondary"> <small> <a rel="noopener" target="_blank" href="https://stackoverflow.com/users/1318313/">devnull</a></span> <span></span> </small> </span> </p> </div> </div> </div> <div itemprop="comment" class="post-comment"> <div class="row"> <div class="col-lg-1"><span class="text-secondary">+0</span></div> <div class="col-lg-11"> <p class="commenttext">@devnull在您的第一條評論中,服務器正在響應錯誤,它不是來自open -uri代碼。至於你的第二個評論,'_'是一個域名中的無效字符。 – <span class="text-secondary"> <small> <span></span> </small> </span> </p> </div> </div> </div> </div> </div> </article> </div> <div class="clearfix"> </div> <div class="relative-box"> <div class="relative">相關問題</div> <ul class="relative_list"> <li> 1. <a href="http://hk.uwenku.com/question/p-tsvxphmo-bnu.html" target="_blank" title="解析HTML與引入nokogiri在Ruby中"> 解析HTML與引入nokogiri在Ruby中 </a> </li> <li> 2. <a href="http://hk.uwenku.com/question/p-nzucthyq-bhy.html" target="_blank" title="解析Ruby代碼有沒有類似於Nokogiri?"> 解析Ruby代碼有沒有類似於Nokogiri? </a> </li> <li> 3. <a href="http://hk.uwenku.com/question/p-fzshcopv-e.html" target="_blank" title="Nokogiri解析Ruby中的HTML表"> Nokogiri解析Ruby中的HTML表 </a> </li> <li> 4. <a href="http://hk.uwenku.com/question/p-utnbkybp-mg.html" target="_blank" title="Ruby Nokogiri解析HTML表"> Ruby Nokogiri解析HTML表 </a> </li> <li> 5. <a href="http://hk.uwenku.com/question/p-cedszmqw-ww.html" target="_blank" title="Nokogiri沒有在CentOS中讀取/解析HTML文件的結構"> Nokogiri沒有在CentOS中讀取/解析HTML文件的結構 </a> </li> <li> 6. <a href="http://hk.uwenku.com/question/p-ryavfmud-p.html" target="_blank" title="Rails Nokogiri XML分析器"> Rails Nokogiri XML分析器 </a> </li> <li> 7. <a href="http://hk.uwenku.com/question/p-wvnfnpdn-bnd.html" target="_blank" title="解析XML使用Ruby和引入nokogiri"> 解析XML使用Ruby和引入nokogiri </a> </li> <li> 8. <a href="http://hk.uwenku.com/question/p-gjfpcgci-bdk.html" target="_blank" title="Ruby on Rails缺失nokogiri/nokogiri"> Ruby on Rails缺失nokogiri/nokogiri </a> </li> <li> 9. <a href="http://hk.uwenku.com/question/p-qhmhoxdb-nm.html" target="_blank" title="如何分析利用引入nokogiri和Ruby"> 如何分析利用引入nokogiri和Ruby </a> </li> <li> 10. <a href="http://hk.uwenku.com/question/p-rzlasktr-bo.html" target="_blank" title="在Ruby中使用Watir/Nokogiri解析網頁"> 在Ruby中使用Watir/Nokogiri解析網頁 </a> </li> <div> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-6208739752673518" data-ad-slot="4606349252"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <li> 11. <a href="http://hk.uwenku.com/question/p-bnucllsw-cd.html" target="_blank" title="在Ruby中用Nokogiri錯誤地解析xml"> 在Ruby中用Nokogiri錯誤地解析xml </a> </li> <li> 12. <a href="http://hk.uwenku.com/question/p-tqtiahfw-bch.html" target="_blank" title="ruby​​ xpath nokogiri"> ruby​​ xpath nokogiri </a> </li> <li> 13. <a href="http://hk.uwenku.com/question/p-sfpikjsj-ht.html" target="_blank" title="Nokogiri(Ruby)和XPath"> Nokogiri(Ruby)和XPath </a> </li> <li> 14. <a href="http://hk.uwenku.com/question/p-wuaufvwa-bv.html" target="_blank" title="無法在Windows中安裝Ruby的Nokogiri"> 無法在Windows中安裝Ruby的Nokogiri </a> </li> <li> 15. <a href="http://hk.uwenku.com/question/p-ngooubyy-qk.html" target="_blank" title="Ruby中的分析日期"> Ruby中的分析日期 </a> </li> <li> 16. <a href="http://hk.uwenku.com/question/p-mgkcyqcd-ow.html" target="_blank" title="如何在Ruby中使用Nokogiri替換現有xml中的值?"> 如何在Ruby中使用Nokogiri替換現有xml中的值? </a> </li> <li> 17. <a href="http://hk.uwenku.com/question/p-qjtxsezz-qx.html" target="_blank" title="Openshift Ruby卡在安裝Nokogiri"> Openshift Ruby卡在安裝Nokogiri </a> </li> <li> 18. <a href="http://hk.uwenku.com/question/p-bwaloxcc-bmq.html" target="_blank" title="在Jruby運行nokogiri vs. ruby​​"> 在Jruby運行nokogiri vs. ruby​​ </a> </li> <li> 19. <a href="http://hk.uwenku.com/question/p-pomiizej-bgq.html" target="_blank" title="在Ruby中輸出CURL"> 在Ruby中輸出CURL </a> </li> <li> 20. <a href="http://hk.uwenku.com/question/p-qltzebpi-kw.html" target="_blank" title="在Ruby中分隔的解析管道"> 在Ruby中分隔的解析管道 </a> </li> <li> 21. <a href="http://hk.uwenku.com/question/p-hxenunxr-tp.html" target="_blank" title="沒有堆分析"> 沒有堆分析 </a> </li> <li> 22. <a href="http://hk.uwenku.com/question/p-arwrthfn-bbk.html" target="_blank" title="Ruby on Rails的分析器沒有返回"> Ruby on Rails的分析器沒有返回 </a> </li> <li> 23. <a href="http://hk.uwenku.com/question/p-nrbaewow-bkb.html" target="_blank" title="基於XSLT在Ruby/Nokogiri中轉換XML"> 基於XSLT在Ruby/Nokogiri中轉換XML </a> </li> <li> 24. <a href="http://hk.uwenku.com/question/p-pttlalty-gu.html" target="_blank" title="解析與Nokogiri"> 解析與Nokogiri </a> </li> <li> 25. <a href="http://hk.uwenku.com/question/p-rtaytzka-ww.html" target="_blank" title="解析只是在Ruby中通過引入nokogiri在HTML節點的內容"> 解析只是在Ruby中通過引入nokogiri在HTML節點的內容 </a> </li> <li> 26. <a href="http://hk.uwenku.com/question/p-cjxwkhcg-zn.html" target="_blank" title="引入nokogiri:解析HTML表的沒有開放標籤"> 引入nokogiri:解析HTML表的沒有開放標籤 </a> </li> <li> 27. <a href="http://hk.uwenku.com/question/p-epgoufmq-cv.html" target="_blank" title="解析與Ruby,Nokogiri和機械化java cookie鏈接在網頁"> 解析與Ruby,Nokogiri和機械化java cookie鏈接在網頁 </a> </li> <li> 28. <a href="http://hk.uwenku.com/question/p-vhohzcoy-s.html" target="_blank" title="如何分析利用引入nokogiri"> 如何分析利用引入nokogiri </a> </li> <li> 29. <a href="http://hk.uwenku.com/question/p-kswlirwl-ng.html" target="_blank" title="不能分析XML與引入nokogiri"> 不能分析XML與引入nokogiri </a> </li> <li> 30. <a href="http://hk.uwenku.com/question/p-ggfefyko-mh.html" target="_blank" title="Nokogiri的Ruby Hpricot是什麼?"> Nokogiri的Ruby Hpricot是什麼? </a> </li> </ul> </div> <div> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block" data-ad-format="autorelaxed" data-ad-client="ca-pub-6208739752673518" data-ad-slot="1575177025"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <div class="padding-top-10"></div> </div> </div> <script type="text/javascript" src="http://img.uwenku.com/uwenku/script/side.js?t=1644592048261"></script> <script type="text/javascript" src="http://img.uwenku.com/uwenku/plugin/highlight/highlight.pack.js"></script> <link href="http://img.uwenku.com/uwenku/plugin/highlight/styles/docco.css" media="screen" rel="stylesheet" type="text/css" /> <script type="text/javascript"> $('pre').each(function(i, e) { hljs.highlightBlock(e, "<span class='indent'> </span>", false) }); </script> <div class="col-lg-3 col-md-4 col-sm-5"> <div id="rightTop"> <div class="row"> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-6208739752673518" data-ad-slot="5415218910" data-ad-format="auto" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <div class="row sidebar panel panel-default"> <div class="panel-heading font-bold"> 最新問題 </div> <div class="m-b-sm m-t-sm clearfix"> <ul class="side_article_list"> <li class="side_article_list_item"> 1. <a href="http://hk.uwenku.com/question/p-fkkjwjzq-qt.html" target="_blank" title="多層次按菜單角4"> 多層次按菜單角4 </a> </li> <li class="side_article_list_item"> 2. <a href="http://hk.uwenku.com/question/p-csctcjgp-rc.html" target="_blank" title="Yodlee providerAccounts REST端點憑證"> Yodlee providerAccounts REST端點憑證 </a> </li> <li class="side_article_list_item"> 3. <a href="http://hk.uwenku.com/question/p-qgmmappj-ro.html" target="_blank" title="體面性能Gem5系統要求"> 體面性能Gem5系統要求 </a> </li> <li class="side_article_list_item"> 4. <a href="http://hk.uwenku.com/question/p-dfaghluo-rx.html" target="_blank" title="熊貓:從行的特定列中選擇值"> 熊貓:從行的特定列中選擇值 </a> </li> <li class="side_article_list_item"> 5. <a href="http://hk.uwenku.com/question/p-zeeftwxs-pp.html" target="_blank" title="Bypass Itunes Connect Testflight應用程序-β審查"> Bypass Itunes Connect Testflight應用程序-β審查 </a> </li> <li class="side_article_list_item"> 6. <a href="http://hk.uwenku.com/question/p-blslbebm-ou.html" target="_blank" title="Android Studio錯誤安裝Java"> Android Studio錯誤安裝Java </a> </li> <li class="side_article_list_item"> 7. <a href="http://hk.uwenku.com/question/p-nvlblvfn-pd.html" target="_blank" title="根vs自舉組件"> 根vs自舉組件 </a> </li> <li class="side_article_list_item"> 8. <a href="http://hk.uwenku.com/question/p-krlhbaih-nz.html" target="_blank" title="滾動到div的ID,堆疊後「停止」 DIV ID隱藏(不smoth滾動)"> 滾動到div的ID,堆疊後「停止」 DIV ID隱藏(不smoth滾動) </a> </li> <li class="side_article_list_item"> 9. <a href="http://hk.uwenku.com/question/p-xhraizgw-ok.html" target="_blank" title="如何模擬來電?"> 如何模擬來電? </a> </li> <li class="side_article_list_item"> 10. <a href="http://hk.uwenku.com/question/p-yngkhxee-ne.html" target="_blank" title="預期一個字符串,但是BEGIN_ARRAY- Gson"> 預期一個字符串,但是BEGIN_ARRAY- Gson </a> </li> </ul> </div> </div> </div> <p class="article-nav-bar"></p> <div class="row sidebar article-nav"> <div class="row box_white visible-sm visible-md visible-lg margin-zero"> <div class="top"> <h3 class="title"><i class="glyphicon glyphicon-th-list"></i> 相關問題</h3> </div> <div class="article-relative-content"> <ul class="side_article_list"> <li class="side_article_list_item"> 1. <a href="http://hk.uwenku.com/question/p-tsvxphmo-bnu.html" target="_blank" title="解析HTML與引入nokogiri在Ruby中"> 解析HTML與引入nokogiri在Ruby中 </a> </li> <li class="side_article_list_item"> 2. <a href="http://hk.uwenku.com/question/p-nzucthyq-bhy.html" target="_blank" title="解析Ruby代碼有沒有類似於Nokogiri?"> 解析Ruby代碼有沒有類似於Nokogiri? </a> </li> <li class="side_article_list_item"> 3. <a href="http://hk.uwenku.com/question/p-fzshcopv-e.html" target="_blank" title="Nokogiri解析Ruby中的HTML表"> Nokogiri解析Ruby中的HTML表 </a> </li> <li class="side_article_list_item"> 4. <a href="http://hk.uwenku.com/question/p-utnbkybp-mg.html" target="_blank" title="Ruby Nokogiri解析HTML表"> Ruby Nokogiri解析HTML表 </a> </li> <li class="side_article_list_item"> 5. <a href="http://hk.uwenku.com/question/p-cedszmqw-ww.html" target="_blank" title="Nokogiri沒有在CentOS中讀取/解析HTML文件的結構"> Nokogiri沒有在CentOS中讀取/解析HTML文件的結構 </a> </li> <li class="side_article_list_item"> 6. <a href="http://hk.uwenku.com/question/p-ryavfmud-p.html" target="_blank" title="Rails Nokogiri XML分析器"> Rails Nokogiri XML分析器 </a> </li> <li class="side_article_list_item"> 7. <a href="http://hk.uwenku.com/question/p-wvnfnpdn-bnd.html" target="_blank" title="解析XML使用Ruby和引入nokogiri"> 解析XML使用Ruby和引入nokogiri </a> </li> <li class="side_article_list_item"> 8. <a href="http://hk.uwenku.com/question/p-gjfpcgci-bdk.html" target="_blank" title="Ruby on Rails缺失nokogiri/nokogiri"> Ruby on Rails缺失nokogiri/nokogiri </a> </li> <li class="side_article_list_item"> 9. <a href="http://hk.uwenku.com/question/p-qhmhoxdb-nm.html" target="_blank" title="如何分析利用引入nokogiri和Ruby"> 如何分析利用引入nokogiri和Ruby </a> </li> <li class="side_article_list_item"> 10. <a href="http://hk.uwenku.com/question/p-rzlasktr-bo.html" target="_blank" title="在Ruby中使用Watir/Nokogiri解析網頁"> 在Ruby中使用Watir/Nokogiri解析網頁 </a> </li> </ul> </div> </div> </div> </div> </div> </div> </div><!-- wrap end--> <!-- footer --> <footer id="footer"> <div class="bg-simple lt"> <div class="container"> <div class="row padder-v m-t"> <div class="col-xs-8"> <ul class="list-inline"> <li><a href="http://hk.uwenku.com/contact">聯系我們</a></li> <li>© 2020 HK.UWENKU.COM</li> <li><a target="_blank" href="https://beian.miit.gov.cn/">沪ICP备13005482号-4</a></li> <li><script type="text/javascript" src="https://v1.cnzz.com/z_stat.php?id=1280101193&web_id=1280101193"></script></li> <li><a href="http://www.uwenku.com/" target="_blank" title="优文库">简体中文</a></li> <li><a href="http://hk.uwenku.com/" target="_blank" title="優文庫">繁體中文</a></li> <li><a href="http://ru.uwenku.com/" target="_blank" title="поле вопросов и ответов">Русский</a></li> <li><a href="http://de.uwenku.com/" target="_blank" title="Frage - und - antwort - Park">Deutsch</a></li> <li><a href="http://es.uwenku.com/" target="_blank" title="Preguntas y respuestas">Español</a></li> <li><a href="http://hi.uwenku.com/" target="_blank" title="कार्यक्रम प्रश्न और उत्तर पार्क">हिन्दी</a></li> <li><a href="http://it.uwenku.com/" target="_blank" title="IL Programma di chiedere Park">Italiano</a></li> <li><a href="http://ja.uwenku.com/" target="_blank" title="プログラム問答園区">日本語</a></li> <li><a href="http://ko.uwenku.com/" target="_blank" title="프로그램 문답 단지">한국어</a></li> <li><a href="http://pl.uwenku.com/" target="_blank" title="program o park">Polski</a></li> <li><a href="http://tr.uwenku.com/" target="_blank" title="Program soru ve cevap parkı">Türkçe</a></li> <li><a href="http://vi.uwenku.com/" target="_blank" title="Đáp ứng viên">Tiếng Việt</a></li> <li><a href="http://fr.uwenku.com/" target="_blank" title="Programme interrogation Park">Française</a></li> </ul> </div> </div> </div> </div> </div> </footer> <!-- / footer --> <script> var _hmt = _hmt || []; (function() { var hm = document.createElement("script"); hm.src = "https://hm.baidu.com/hm.js?f78a970f17b19a79fc477a3378096f29"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(hm, s); })(); </script> </body> </html>