2016-04-25 90 views
-3

我收到以下錯誤消息。我在使用python beautifulsoup的網絡報廢時出現錯誤

Traceback (most recent call last):File "ex1.py", line 9, in <module> 
    print(soup.prettify()) 
    File "C:\Python34\lib\encodings\cp437.py", line 19, in encodereturn 
    codecs.charmap_encode(input,self.errors,encoding_map)[0] 
UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position35013: character maps to <undefined> 

我的源代碼如下:

import requests 
from bs4 import BeautifulSoup 

url = 'http://www.yellowpages.com/search?search_terms=coffee&geo_location_terms=Los+Angeles%2C+CA' 
response = requests.get(url) 
html = response.content 

soup = BeautifulSoup(html, "html.parser") 
print(soup.prettify()) 

回答

0

改變這種工作在礦山

html = response.text 
soup = BeautifulSoup(html) 
print soup.prettify() 
+1

不,這仍然不適合我。我想要獲取所有html內容,而不僅僅是文本。 –

0

你在Windows上運行嗎?導致問題是由於您的html內容的編碼。

我想這可能工作:

import requests 
from bs4 import BeautifulSoup 

url = 'http://www.yellowpages.com/search?search_terms=coffee&geo_location_terms=Los+Angeles%2C+CA' 
response = requests.get(url) 
html = response.content 

soup = BeautifulSoup(html, "html.parser") 
print(soup.prettify().encode('UTF-8')) 

編碼上prettify()傳遞參數應該工作了。像這樣:

soup.prettify(encoding='utf-8') 
相關問題