2015-05-14 50 views
0

已解決 - 「abc = list.scan(/ [([^)] +)] /)。last.first」行是正確的,但也包含引號,網站搜索表單不接受。將其更正爲abc = list.scan(/ \「([^)] +)\」/)。join。Ruby機械化表單輸入字段文本

感謝您的幫助。


我必須使用csv文件中的100個關鍵字列表自動執行搜索。

隨着機械化,我可以用這個例子(http://mechanize.rubyforge.org/GUIDE_rdoc.html)提交的搜索:

agent = Mechanize.new 
page = agent.get('http://google.com/') 
google_form = page.form('f') 
google_form.q = 'ruby mechanize' 
page = agent.submit(google_form) 
pp page 

然而,當我做它通過CSV文件中循環,它返回一個錯誤(在這個例子中,第一個CSV條目將是「紅寶石機械化」:

#i have already imported the csv list, now it is looping through the array "raw_list" 

raw_list.each do |list| 
abc = list.scan(/\[([^\)]+)\]/).last.first 

# i tested a "puts abc" which returned "ruby mechanize", so I don't understand why the rest of this doesn't work 


agent = Mechanize.new 
page = agent.get('http://google.com/') 
google_form = page.form('f') 
google_form.q = abc 

#even though abc = "ruby mechanize", an error occurs. 


page = agent.submit(google_form) 
pp page 

它似乎並沒有採取可變「ABC」,但工程如果你手動輸入「紅寶石機械化」即使博th是一樣的。

出現的錯誤是:

C:filename: in `block (2 levels) in <top (required)>': undefined method `text' for nil:NilClass (NoMethodError) 
from C:/RailsInstaller/Ruby2.0.0/lib/ruby/gems/2.0.0/gems/mechanize-2.7.3/lib/mechanize.rb:442:in `get' 
from C:/Users/victor/RubymineProjects/untitled/scraper.rb:23:in `block in <top (required)>' 
from C:/Users/victor/RubymineProjects/untitled/scraper.rb:19:in `each' 
from C:/Users/victor/RubymineProjects/untitled/scraper.rb:19:in `<top (required)>' 
from -e:1:in `load' 
from -e:1:in `<main>' 

任何幫助,將不勝感激。

+0

發佈的錯誤可能會有所幫助:) – pguardiario

+0

沒有看到行號我不能告訴太多。我的猜測是你得到一個沒有表格的頁面(其中一個'我們認爲你是一個機器人'頁面) – pguardiario

回答

0

您的錯誤是告訴你,代碼中第19行的內容導致了機械化行442中的問題。

我想你的樣品從在IRB,它似乎很好地工作:

2.2.2 :001 > require 'mechanize' 
=> true 
2.2.2 :002 > agent = Mechanize.new 
=> #<Mechanize:... 
2.2.2 :003 > page = agent.get('http://google.com/') 
=> #<Mechanize::Page 
    ... 
2.2.2 :004 > google_form = page.form('f') 
=> #<Mechanize::Form 
... 
2.2.2 :005 > google_form.q 
=> "" 
2.2.2 :006 > abc = "ruby mechanize" 
=> "ruby mechanize" 
2.2.2 :007 > google_form.q = abc 
=> "ruby mechanize" 
2.2.2 :008 > page = agent.submit(google_form) 
=> #<Mechanize::Page 
... 

掃描將返回nil,如果沒有被發現,那麼你的錯誤發生在這裏:

abc = list.scan(/\[([^\)]+)\]/).last.first 

http://ruby-doc.org/stdlib-2.2.0/libdoc/strscan/rdoc/StringScanner.html

您可以用:

abc = list.scan(/\[([^\)]+)\]/).join 

儘管它可能只是「」,你總是會得到一個字符串。

http://ruby-doc.org/core-2.2.0/Array.html#method-i-join