機械化不會連接到網站

歡迎，我有一個問題，寶石mechanize不會連接到一個網站。寶石已安裝。代碼：機械化不會連接到網站

require 'mechanize' 

agent = Mechanize.new 
main_page = agent.get 'https://imbd.com' 
main_page.link_with(text: "Top 250").click 
rows = list_page.root.css(".lister-list tr") 

puts rows.size

，這是一個錯誤：

C:/Ruby/lib/ruby/2.2.0/net/http.rb:879:in `initialize': A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. - connect(2) for "imbd.com" port 80 (Errno::ETIMEDOUT) 
    from C:/Ruby/lib/ruby/2.2.0/net/http.rb:879:in `open' 
    from C:/Ruby/lib/ruby/2.2.0/net/http.rb:879:in `block in connect' 
    from C:/Ruby/lib/ruby/2.2.0/timeout.rb:73:in `timeout' 
    from C:/Ruby/lib/ruby/2.2.0/net/http.rb:878:in `connect' 
    from C:/Ruby/lib/ruby/2.2.0/net/http.rb:863:in `do_start' 
    from C:/Ruby/lib/ruby/2.2.0/net/http.rb:858:in `start' 
    from C:/Ruby/lib/ruby/gems/2.2.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:700:in `start' 
    from C:/Ruby/lib/ruby/gems/2.2.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:631:in `connection_for' 
    from C:/Ruby/lib/ruby/gems/2.2.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:994:in `request' 
    from C:/Ruby/lib/ruby/gems/2.2.0/gems/mechanize-2.7.4/lib/mechanize/http/agent.rb:267:in `fetch' 
    from C:/Ruby/lib/ruby/gems/2.2.0/gems/mechanize-2.7.4/lib/mechanize.rb:464:in `get' 
    from C:/Ruby/Workspace/imbd.rb:4:in `<main>'

任何人有任何的想法有什麼不對？謝謝！

來源

2016-05-21 Ioo

雖然這是真的，機械化不支持javascript，你的問題是，你試圖訪問一個不存在的站點。您正試圖訪問www.imbd.com而不是www.imdb.com。所以，錯誤信息是準確的。

而且FWIW，IMDB不希望你來刮他們的網站：

Robots and Screen Scraping: You may not use data mining, robots, screen scraping, or similar data gathering and extraction tools on this site, except with our express written consent as noted below.

來源

2016-05-21 21:21:15 orde

不知道。我只從他們的數據庫中提取10個標題。只是爲了教育的目的，而不是商業或垃圾郵件:)謝謝你的拼寫錯誤，現在我只是愚蠢的。在代碼的其他部分查找錯誤。 – Ioo

看着imdb後，我看到他們正在運行大量的JavaScript，因爲它無法解析js並理解傳入的響應，所以會發生機械化。如果您正在尋找內容或自動瀏覽，我會建議使用Capybara而不是機械化。將水豚與Poltergeist（你需要用這種方法安裝phantom.js）結合起來會比Mechanize工作得更好，它的構建是爲了與加載大量js的頁面進行自動交互。

我添加了一種方法來爲您解決錯誤。如果這是有效的，那是因爲Mechanize試圖在js腳本完成之前獲取頁面，因此沒有獲得有效的數據。

編輯：

agent = Mechanize.new 
    agent.read_timeout=3 #set the agent time out 
    begin 
    main_page = agent.get 'https://imbd.com' 
    main_page.link_with(text: "Top 250").click 
    rows = list_page.root.css(".lister-list tr") 
    rescue Timeout::Error 
    puts "Timeout!" 
    puts "read_timeout attribute is set to #{agent.read_timeout}s" if !agent.read_timeout.nil? 
    end

來源

2016-05-21 16:54:22 bkunzi01

我在電視上看到機械化在這個確切的方式處理imbd頁面一個教程和我一起爲之奮鬥......該死。 – Ioo

鏈接我的教程，可能已過時。我曾經依靠機械化來剷除，但現在隨着我不得不開始使用水豚和波特精靈的js站點的復甦，並且不能開心。 – bkunzi01

下面是截圖，它來自我的光盤上的視頻 - http://prntscr.com/b6r2j7 – Ioo

機械化不會連接到網站

回答

相關問題