我想下載一個頁面,填寫表單並提交它。我喜歡python並碰到機械化。我可以成功下載網頁,驗證頁面中有2個表單,但即使我可以確認機械化下載的網頁數據明確包含第二個表單,機械化不會識別第二個表單(方法POST)。因此,我甚至無法修改這些值並提交我感興趣的表單。我在OS X 10.6.8上的Python 2.6.1上。任何建議非常感謝。
我的代碼
import mechanize
br = mechanize.Browser()
br.set_handle_robots(False) # no robots
br.set_handle_refresh(False) # can sometimes hang without this
br.addheaders = [('User-agent', 'Mozilla/6.0 (X11; U; i686; en-US; rv:1.9.0.1) Gecko/2008071615 OS X 10.2 Firefox/3.0.1')]
url = 'http://www.abcd.com/test.html'
response = br.open(url)
我可以驗證使用response.read()或GET_DATA(),有兩種形式,如下面
<form id="lookupFormX" action="/lookup/" onSubmit="return submitLookupForm('lookupForm', 'download');" method="GET">
<label style="font-weight:normal; font-size:85%; margin-right:5px;">View a Site Report </label>
<input type="hidden" name="facet" style="margin-right:2px; font-weight:normal; font-size:85%;" value="sitereport" readonly/>
<input style="margin-right:2px; font-weight:normal; font-size:85%;" name="q" type="text" id="railtext_v11pt" value="e.g. yahoo.com"
onfocus="clearDefaultNote(this,'e.g. yahoo.com');"
onblur="addDefaultNote(this,'e.g. yahoo.com');" />
<a style="margin-right:10px;" href="#" onclick="submitLookupForm('lookupFormX');"><img src="/images/nav_right.gif" /></a>
</form>
<br>
<FORM action="userfeedbackpost.html" id="friendForm" name="friendForm" method="post">
<TABLE id="userfeedbacktable" BORDER=0 style="padding:left:0px; margin-left:0px;">
<TR>
<TD style="width:200px;padding-left:10px">Your Name:</TD>
<TD style="width:200px" ><input name="your_name" type="text" SIZE=35/></TD>
<TD style="width:250px;text-align:right;padding-right:10px">Your E-mail:</TD>
<TD style="width:140px" ><input name="your_email" type="text" SIZE=35/></TD>
</TR>
<!-- <TR></TR> -->
<TR>
<TD style="width:200px;padding-left:10px">Subject:</TD>
<TD colspan="3" ><input name="subject" type="text" style="width:648px" SIZE=106/></TD>
</TR>
<!-- <TR></TR> -->
<TR>
<TD style="width:200px;padding-left:10px">URL this concerns:</TD>
<TD colspan="3" ><input name="url" type="text" style="width:648px" SIZE=106/></TD>
</TR>
<!-- <TR></TR> -->
<TR>
<TD style="width:200px;padding-left:10px">User ID:</TD>
<TD style="width:200px" ><input name="test_id" type="text" SIZE=35/></TD>
<TD style="width:250px;text-align:right;padding-right:10px">Type of inquiry:</TD>
<TD style="width:140px" >
<SELECT name="type" id="type" style="width:262px" onchange="makeSelection()">
<OPTION value="Choose">Choose One</OPTION>
<OPTION value="Bug report">Report an error</OPTION>
<OPTION value="Helpful Information">Send us a suggestion</OPTION>
<OPTION value="Other">Other</OPTION>
</SELECT>
</TD>
</TR>
<!-- <TR></TR> -->
<TR id="infoPanel" style="display:none">
<TD style="width:200px;padding-left:10px">Facet in question:</TD>
<TD style="width:200px" >
<SELECT name="facet" style="width:263px" id="facet">
<OPTION selected value="Choose">Choose One</OPTION>
<OPTION value="Annoyances">Annoyances</OPTION>
<OPTION value="Downloads">Downloads</OPTION>
<OPTION value="Links">Links</OPTION>
</SELECT>
</TD>
<TD style="width:250px;text-align:right;padding-right:10px">Are you the site owner?:</TD>
<TD style="width:140px" >
<input type="radio" id="siteowner_yes" name="siteowner" value="Yes"> Yes
<input type="radio" id="siteowner_no" name="siteowner" value="No" checked> No
</TD>
</TR>
<!-- <TR></TR> -->
<TR>
<TD style="width:200px;padding-left:10px" >Your Message:</TD>
<TD colspan=3><textarea class=userfeedbackTA NAME=message ROWS=12 COLS=80 style="width:646px;"></textarea></TD>
</TR>
<!-- <TR></TR> -->
</TABLE>
<br/><br/> <a href="javascript:document.getElementById('friendForm').submit();" class="btnOrangeLrg"><span>Send Your Feedback or Question.</span></a><br/>
<br/><br/> P.S. We will use the information above only to help provide you feedback. This information will not be used for any other purpose.
</FORM>
僅機械化顯示以下內容:
Form name: None
<GET http://www.test.com/lookup/ application/x-www-form-urlencoded
<HiddenControl(facet=sitereport) (readonly)>
<TextControl(q=e.g. yahoo.com)>>
當我用下面的代碼
for form in br.forms():
print "Form name:", form.name
print form
我的問題: - 我如何才能訪問第二個表單? (使用NR = 1給了我一個錯誤)
編輯:
我想這一個版本的太多,同樣的結果,第二個表格將不會顯示出來:
request = mechanize.Request(url)
request.add_header("User-agent", "Mozilla/6.0 (X11; U; i686; en-US; rv:1.9.0.1) Gecko/2008071615 OS X 10.2 Firefox/3.0.1")
response = mechanize.urlopen(request)
forms = mechanize.ParseResponse(response, backwards_compat=False)
response.close()
for form in forms:
print form
編輯2
我也試圖改變我的代碼看起來像這樣:
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.addheaders = [
('Cookie','mbox=PC#1327356910232-537677#1410633293|check#true#1347561353|session#1347561287712-498080#1347563153; s_cc=true; s_sq=%5B%5BB%5D%5D; s_nr=1347561671754-Repeat'),\
('Accept-Charset','ISO-8859-1,utf-8;q=0.7,*;q=0.3'),\
('Accept-Encoding','gzip,deflate,sdch'),\
('Accept-Language','en-US,en'),\
('Cache-Control','max-age=0'),\
('Connection','keep-alive'),\
('Referer','http://www.siteadvisor.com'),\
('User-Agent','Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1')
]
我的M拾起頭值y瀏覽器並嘗試將它們插入機械化瀏覽器實例中。然而,我只能看到這種形式。
是的,你是對的。該網址是http://www.siteadvisor.com/userfeedback.html – JPK