2012-07-25 42 views
5

我正在使用Python 2.7和Mechanize 2.5。我試圖使用select_form()方法,但我收到以下錯誤:Python機械化select_form() - ParseError:選擇之外的選擇

File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 499, in select_form 
    global_form = self._factory.global_form 
    File "C:\Python27\lib\site-packages\mechanize\_html.py", line 544, in __getattr__ 
    self.forms() 
    File "C:\Python27\lib\site-packages\mechanize\_html.py", line 557, in forms 
    self._forms_factory.forms()) 
    File "C:\Python27\lib\site-packages\mechanize\_html.py", line 237, in forms 
    _urlunparse=_rfc3986.urlunsplit, 
    File "C:\Python27\lib\site-packages\mechanize\_form.py", line 845, in ParseResponseEx 
    _urlunparse=_urlunparse, 
    File "C:\Python27\lib\site-packages\mechanize\_form.py", line 982, in _ParseFileEx 
    fp.feed(data) 
    File "C:\Python27\lib\site-packages\mechanize\_form.py", line 759, in feed 
    _sgmllib_copy.SGMLParser.feed(self, data) 
    File "C:\Python27\lib\site-packages\mechanize\_sgmllib_copy.py", line 110, in feed 
    self.goahead(0) 
    File "C:\Python27\lib\site-packages\mechanize\_sgmllib_copy.py", line 144, in goahead 
    k = self.parse_starttag(i) 
    File "C:\Python27\lib\site-packages\mechanize\_sgmllib_copy.py", line 302, in parse_starttag 
    self.finish_starttag(tag, attrs) 
    File "C:\Python27\lib\site-packages\mechanize\_sgmllib_copy.py", line 347, in finish_starttag 
    self.handle_starttag(tag, method, attrs) 
    File "C:\Python27\lib\site-packages\mechanize\_sgmllib_copy.py", line 387, in handle_starttag 
    method(attrs) 
    File "C:\Python27\lib\site-packages\mechanize\_form.py", line 736, in do_option 
    _AbstractFormParser._start_option(self, attrs) 
    File "C:\Python27\lib\site-packages\mechanize\_form.py", line 481, in _start_option 
    raise ParseError("OPTION outside of SELECT") 
ParseError: OPTION outside of SELECT 

這是我的代碼:

cj = cookielib.LWPCookieJar() 
br = mechanize.Browser() 
br.set_cookiejar(cj) 
br.set_handle_equiv(True) 
br.set_handle_gzip(True) 
br.set_handle_redirect(True) 
br.set_handle_referer(True) 
br.set_handle_robots(False) 
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1) 
br.addheaders = [('User-Agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] 
br.open("website_url_which_i_will_not_share") 
br.select_form(nr=0) 

以下是HTML的網頁上的表格部分我打開

<html lang="en-us" xml:lang="en-us" xmlns="http://www.w3.org/1999/xhtml"> 
    <head> I omitted this section </head> 
    <body class="login"> 
     <div id="container"> 
      <div id="header" style="background-color: #13397A;"> 
      <div id="content" class="colM"> 
       <div id="content-main"> 
        <form id="login-form" method="post" action="/admin/"> 
         <div style="display:none"> 
          <input type="hidden" value="8a689f2e3d215a3465f1bb66e037d1a5" name="csrfmiddlewaretoken"> 
         </div> 
         <div class="form-row"> 
          <label class="required" for="id_username">Username:</label> 
          <input id="id_username" type="text" maxlength="30" name="username"> 
         </div> 
         <div class="form-row"> 
          <label class="required" for="id_password">Password:</label> 
          <input id="id_password" type="password" name="password"> 
          <input type="hidden" value="1" name="this_is_the_login_form"> 
          <input type="hidden" value="/admin/" name="next"> 
         </div> 
         <div class="submit-row"> 
          <label>&nbsp;</label> 
          <input type="submit" value="Log in"> 
         </div> 
        </form> 
        <script type="text/javascript"> 
       </div> 
       <br class="clear"> 
      </div> 
      <div id="footer"></div> 
     </div> 
     <script type="text/javascript"> 
    </body> 
</html> 

我研究這個計算器上和谷歌,但我不能找到一個類似的問題或這個錯誤的甚至描述。

如果有人能告訴我這個錯誤的含義和幫助我在這裏找到什麼錯誤,我將不勝感激。

感謝

編輯:我已經做了很多形式提交的,每個網站的作品,除了這一個精細。這是一個數據庫API,我試圖從中取消數據。

+0

爲了幫助,人們需要能夠重現您的問題,沒有整頁的來源,這是不可能的。 – 2012-07-25 22:40:40

+0

謝謝你讓我知道,我編輯我的問題,包括所有的HTML(不包括頭標記) – camelCase 2012-07-26 16:35:51

+0

你用網頁瀏覽器獲得的源代碼和源機械化請求可能是不同的,爲了調試的目的嘗試'打印br .forms'在調用select_form之前。也許默認的分析器不能處理無效的HTML(''應該是''?)。 – 2012-07-27 01:38:11

回答

2

我有同樣的問題(可惜沒有解決它尚未),我發現這個有趣的一段代碼,它可能會幫助

http://comments.gmane.org/gmane.comp.python.wwwsearch.general/1991

import mechanize 
from BeautifulSoup import BeautifulSoup 

class SanitizeHandler(mechanize.BaseHandler): 
    def http_response(self, request, response): 
     if not hasattr(response, "seek"): 
      response = mechanize.response_seek_wrapper(response) 
     #if HTML used get it though a robust Parser like BeautifulSoup 

     if response.info().dict.has_key('content-type') and ('html' in response.info().dict['content-type']): 
      soup = BeautifulSoup(response.get_data()) 
      response.set_data(soup.prettify()) 
     return response 

br = mechanize.Browser() 
br.add_handler(SanitizeHandler()) 

# Now you get good HTML 

這應該重寫HTTP_RESPONSE方法和「乾淨」您的HTML。

+0

張貼的來源已死亡。你能給出更多的信息,爲什麼這是一個解決方案? – 2016-08-24 18:15:21