2011-07-04 24 views
1

我正在用python編寫一個web爬網程序,但無法使用機械化登錄。網站上的表格看起來像:當嘗試讀取.read()時,Mechanze表單提交會導致'Assertion Error'錯誤

<form method="post" action="PATLogon"> 
    <h2 align="center"><img src="/myaladin/images/aladin_logo_rd.gif"></h2> 
    <!-- ALADIN Request parameters --> 
    <input type=hidden name=req value="db"> 
    <input type=hidden name=key value="PROXYAUTH"> 
    <input type=hidden name=url value="http://eebo.chadwyck.com/search"> 
    <input type=hidden name=lib value="8">  
<table> 
<tr><td><b>Last Name:</b></td> 
    <td><input name=LN size=20 maxlength=26></td> 
<tr><td><b>University ID or Library Barcode:</b></td> 
    <td><input type=password name=BC size=20 maxlength=21></td> 
<tr><td><b>Institution:</b></td> 
    <td><select name="INST"> 
     <option value="??">Select University ----</option> 
     <option value="AU">American</option> 
     <option value="CU">Catholic</option> 
     <option value="DC">District of Columbia</option> 
     <option value="GA">Gallaudet</option> 
     <option value="GM">George Mason</option> 
     <option value="GW">George Washington</option> 
     <option value="GT">Georgetown</option> 
     <option value="MU">Marymount</option> 
     <option value="TR">Trinity</option> 
     </select> 
     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
     <input type="submit" value="GO"> 
    </td></tr></table></form> 

所以,我能夠適當地但提交表單並嘗試打印我留下一個錯誤的響應設置應有盡有。我的代碼如下:

import mechanize 
import time 
br = mechanize.Browser() 
br.set_handle_robots(False) 

def connect(): 
    # connection information              
    url = "https://www.aladin.wrlc.org/Z-WEB/Aladin?req=db&key=PROXYAUTH&lib=8&\url=http://eebo.chadwyck.com/search" 
    br.open(url) 
    time.sleep(0.5) 
    br.select_form(nr=0) 
    br["LN"] = "Reese" 
    br["BC"] = "myPassword" 
    br["INST"] = ["AU"] 
    response = br.submit() 
    print response.getheaders() 

我來到這裏的錯誤是:

>>> eebolib.connect() 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "eebolib.py", line 28, in connect 
    print response.read() 
    File "build/bdist.macosx-10.5-fat3/egg/mechanize/_response.py", line 190, in read 
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 349, in read 
    data = self._sock.recv(rbufsize) 
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 553, in read 
    if self.length is not None: 
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1282, in read 
    if amt is None or amt > self._line_left: 
AssertionError 

如果任何人都可以提供這方面我最欣賞一些幫助。

+0

此代碼爲我工作在Ubuntu 11.04使用Python 2.6.6和0.2.5機械化(我取代response.getheaders()與response.read())。 – infrared

回答

2

這是我找到了解決辦法:

import mechanize,urllib,ClientForm,cookielib,re,os,time 
from BeautifulSoup import BeautifulSoup 

cookies = mechanize.CookieJar() 
opener = mechanize.build_opener(mechanize.HTTPCookieProcessor(cookies)) 
headers = [("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"),\ 
      ("Accept-Charset","ISO-8859-1,utf-8;q=0.7,*;q=0.7"),\ 
      ("Accept-Encoding","gzip, deflate"),\ 
      ("Accept-Language","en-us,en;q=0.5"),\ 
      ("Connection","keep-alive"),\ 
      ("Host","www.aladin.wrlc.org"),\ 
      ("Referer","https://www.aladin.wrlc.org/Z-WEB/Aladin?req=db&key=PROXYAUTHlib=8url=http://eebo.chadwyck.com/search"),\ 
      ("User-Agent","Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:5.0) Gecko/20100101 Firefox/5.0")] 
opener.addheaders = headers 
mechanize.install_opener(opener) 
params = urllib.urlencode({'LN':'myLN','BC':'myBC','INST':'myINST',\ 
          'req':'db','key':'PROXYAUTH','lib':'8',\ 
          'url':'http://eebo.chadwyck.com/search'}) 
mechanize.urlopen("https://www.aladin.wrlc.org/Z-WEB/PATLogon",params) 

希望這可以幫助別人一天:)

相關問題