2
解析與BeautifulSoup4,chardet的和Python 3.3的網頁
我收到以下錯誤,當我嘗試調用BeautifulSoup(頁)錯誤而在Windows
Traceback (most recent call last):
File "error.py", line 10, in <module>
soup = BeautifulSoup(page)
File "C:\Python33\lib\site-packages\bs4\__init__.py", line 169, in __init__
self.builder.prepare_markup(markup, from_encoding))
File "C:\Python33\lib\site-packages\bs4\builder\_htmlparser.py", line 136, in
prepare_markup
dammit = UnicodeDammit(markup, try_encodings, is_html=True)
File "C:\Python33\lib\site-packages\bs4\dammit.py", line 223, in __init__
u = self._convert_from(chardet_dammit(self.markup))
File "C:\Python33\lib\site-packages\bs4\dammit.py", line 30, in chardet_dammit
return chardet.detect(s)['encoding']
File "C:\Python33\lib\site-packages\chardet\__init__.py", line 21, in detect
import universaldetector
ImportError: No module named 'universaldetector'
我正在運行的Python 3.3在Windows 7中,我已經安裝了通過下載.tar.gz從setup.py獲得bs4。我已經安裝了pip,然後通過執行pip.exe install chardet安裝了chardet。我的chardet版本是2.2.1。 Bs4適用於其他網址。
下面的代碼
import sys
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
import chardet
url = "http://www.edgar-online.com/brand/yahoo/search/?cik=1400810"
page = urlopen(url).read()
#print(page)
soup = BeautifulSoup(page)
我期待着你的答案