2016-01-12 121 views
4

我有,我不能用機械化訪問的網址,我不知道爲什麼:紅寶石機械化404 =>網:: HTTPNotFound

# Use ruby 2.1.6 
require 'mechanize' 
require 'axlsx' # 2.0.1 
require 'roo' # 1.13.2 

mechanize = Mechanize.new 
mechanize.request_headers = { "Accept-Encoding" => "" } 
mechanize.ignore_bad_chunking = true 
mechanize.follow_meta_refresh = true 

xlsx = Roo::Excelx.new("./base_list.xlsx") 

xlsx.each_with_pagename do |page, sheet| 
    sheet.each do |row| 
    page = mechanize.get(row[0]) 
    end 
end 

當我重複我的名單上,我得到的網址,如: https://angel.co/_helencousins,我可以用我的瀏覽器,但不與機械化訪問它,我有這樣的錯誤:

/.rvm/gems/ruby-2.1.6/gems/mechanize-2.7.4/lib/mechanize/http/agent.rb:316:in `fetch': 404 => Net::HTTPNotFound for https://angel.co/_helencousins -- unhandled response (Mechanize::ResponseCodeError) 
    from /Users/xxx/.rvm/gems/ruby-2.1.6/gems/mechanize-2.7.4/lib/mechanize.rb:464:in `get' 
    from scraper.rb:15:in `block (2 levels) in <main>' 
    from /Users/xxx/.rvm/gems/ruby-2.1.6/gems/roo-1.13.2/lib/roo/base.rb:428:in `block in each' 
    from /Users/xxx/.rvm/gems/ruby-2.1.6/gems/roo-1.13.2/lib/roo/base.rb:427:in `upto' 
    from /Users/xxx/.rvm/gems/ruby-2.1.6/gems/roo-1.13.2/lib/roo/base.rb:427:in `each' 
    from scraper.rb:14:in `block in <main>' 
    from /Users/xxx/.rvm/gems/ruby-2.1.6/gems/roo-1.13.2/lib/roo/base.rb:398:in `block in each_with_pagename' 
    from /Users/xxx/.rvm/gems/ruby-2.1.6/gems/roo-1.13.2/lib/roo/base.rb:397:in `each' 
    from /Users/xxx/.rvm/gems/ruby-2.1.6/gems/roo-1.13.2/lib/roo/base.rb:397:in `each_with_pagename' 
    from scraper.rb:13:in `<main>' 
+0

可能的問題:http://stackoverflow.com/questions/8567973/why-does-accessing-a-ssl-site-with-mechanize-on-windows-fail但在mac工作 – 7stud

+0

它們很好地嗅探請求的用戶代理信息並拒絕任何不是標準瀏覽器的東西。首先,你應該看看他們是否有一個API,如果是的話,使用它。如果他們不尋找他們的服務條款,並看看他們是否允許機械化刮擦。如果是這樣,請嘗試更改您的用戶代理字符串或聯繫他們的網站管理員並尋求幫助。 –

回答

3

好吧,

的problème是,該網站禁止機械化用戶代理。

我只是把它改爲:mechanize.user_agent_alias = 'Windows Chrome'