2009-10-05 33 views
1

從機械化導入瀏覽器 BR =瀏覽器() 頁= br.open( 'http://wow.interzet.ru/news.php?readmore=23') br.form = br.forms()的next() 打印br.form 給了我以下錯誤:如何解決這種ClientForm錯誤?

Traceback (most recent call last): 
    File "C:\Users\roddik\Desktop\mech.py", line 6, in <module> 
    br.form = br.forms().next() 
    File "build\bdist.win32\egg\mechanize\_mechanize.py", line 426, in forms 
    File "D:\py26\lib\site-package\mechanize-0.1.11-py2.6.egg\mechanize\_html.py", line 559, in forms 
    File "D:\py26\lib\site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_html.py", line 225, in forms 
    File "D:\py26\lib\site-packages\clientform-0.2.10-py2.6.egg\ClientForm.py", line 967, in ParseResponseEx 
    File "D:\py26\lib\site-packages\clientform-0.2.10-py2.6.egg\ClientForm.py", line 1100, in _ParseFileEx 
    File "D:\py26\lib\site-packages\clientform-0.2.10-py2.6.egg\ClientForm.py", line 870, in feed 
    File "D:\py26\lib\sgmllib.py", line 104, in feed 
    self.goahead(0) 
    File "D:\py26\lib\sgmllib.py", line 138, in goahead 
    k = self.parse_starttag(i) 
    File "D:\py26\lib\sgmllib.py", line 290, in parse_starttag 
    self._convert_ref, attrvalue) 
    File "D:\py26\lib\sgmllib.py", line 302, in _convert_ref 
    return self.convert_charref(match.group(2)) or \ 
    File "D:\py26\lib\site-packages\clientform-0.2.10-py2.6.egg\ClientForm.py", line 850, in convert_charref 
    File "D:\py26\lib\site-packages\clientform-0.2.10-py2.6.egg\ClientForm.py", line 244, in unescape_charref 

ValueError: invalid literal for int() with base 10: 'e' 

我該如何解決?

編輯:

我已經用這種方式修復了它。可以嗎?如果不是,反而如何?

import ClientForm 
from mechanize import Browser 

def myunescape_charref(data, encoding): 
    if not str(data).isdigit(): return 0 
    name, base = data, 10 
    if name.startswith("x"): 
     name, base= name[1:], 16 
    uc = unichr(int(name, base)) 
    if encoding is None: 
     return uc 
    else: 
     try: 
      repl = uc.encode(encoding) 
     except UnicodeError: 
      repl = "&#%s;" % data 
     return repl 

ClientForm.unescape_charref = myunescape_charref 

回答

1

問題是由網址,這樣

http://wow.zet/forum/index.php?showtopic=1197&pid=30419&st=0&#entry30419 

ClientForm引起&#後正在尋找一個整數

它是確定在url中#,但應在HTML中逃脫 爲&#意味着字符編碼

+0

好,我想我不需要URI的片段組成部分都沒有。現在如何在發送到clientform之前更改機械化以在html中替換它們? – Fluffy 2009-10-06 10:29:22