2013-05-26 79 views
2
解析與BeautifulSoup4,chardet的和Python 3.3的網頁

我收到以下錯誤,當我嘗試調用BeautifulSoup(頁)錯誤而在Windows

Traceback (most recent call last): 
File "error.py", line 10, in <module> 
    soup = BeautifulSoup(page) 
File "C:\Python33\lib\site-packages\bs4\__init__.py", line 169, in __init__ 
    self.builder.prepare_markup(markup, from_encoding)) 
File "C:\Python33\lib\site-packages\bs4\builder\_htmlparser.py", line 136, in 
prepare_markup 
    dammit = UnicodeDammit(markup, try_encodings, is_html=True) 
File "C:\Python33\lib\site-packages\bs4\dammit.py", line 223, in __init__ 
    u = self._convert_from(chardet_dammit(self.markup)) 
File "C:\Python33\lib\site-packages\bs4\dammit.py", line 30, in chardet_dammit 

    return chardet.detect(s)['encoding'] 
File "C:\Python33\lib\site-packages\chardet\__init__.py", line 21, in detect 
    import universaldetector 
ImportError: No module named 'universaldetector' 

我正在運行的Python 3.3在Windows 7中,我已經安裝了通過下載.tar.gz從setup.py獲得bs4。我已經安裝了pip,然後通過執行pip.exe install chardet安裝了chardet。我的chardet版本是2.2.1。 Bs4適用於其他網址。

下面的代碼

import sys 
from urllib.request import urlopen 
from bs4 import BeautifulSoup 
import re 
import chardet 

url = "http://www.edgar-online.com/brand/yahoo/search/?cik=1400810" 
page = urlopen(url).read() 
#print(page) 
soup = BeautifulSoup(page) 

我期待着你的答案

回答

1

我剛纔符合這種情況。
不要導入chardet,我也卸載chardet。
然後構建會通過。
下面的代碼是美麗的dammit.py lib的一部分。
也許你導入一個chardet不適合python 3.3,所以發生錯誤。

try: 
    # First try the fast C implementation. 
    # PyPI package: cchardet 
    import cchardet 
    def chardet_dammit(s): 
     return cchardet.detect(s)['encoding'] 
except ImportError: 
    try: 
     # Fall back to the pure Python implementation 
     # Debian package: python-chardet 
     # PyPI package: chardet 
     import chardet 
     def chardet_dammit(s): 
      return chardet.detect(s)['encoding'] 
     #import chardet.constants 
     #chardet.constants._debug = 1 
    except ImportError: 
     # No chardet available. 
     def chardet_dammit(s): 
      return None