2014-04-28 33 views
0

我使用python 3.4與美麗的湯4和請求。 我想抓住網頁,並使用美麗的湯打印它的文字。它可以抓住網頁並打印標題,它甚至可以爲我提供的編碼提供utf-8編碼,但是當我嘗試從頁面打印文本時,它會出現編碼錯誤。美麗的湯4不打印文字從網頁

from bs4 import BeautifulSoup 
import requests 

sparknotesSearch = requests.get("http://www.sparknotes.com/search?q=Sonnet") 
soup = BeautifulSoup(sparknotesSearch.text) 

print (soup.title) 
#Can't print this? 
print(soup.get_text()) 

錯誤/輸出我得到的是這樣的:

<title>SparkNotes Search Results: sONNET</title> 
Traceback (most recent call last): 
    File "C:\Users\Cayle J. Elsey\Dropbox\Programming\Salient_Point\networking.py", line 10, in <module> 
    print(soup.get_text()) 
    File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode 
    return codecs.charmap_encode(input,self.errors,encoding_table)[0] 
UnicodeEncodeError: 'charmap' codec can't encode character '\u2192' in position 6238: character maps to <undefined> 
[Finished in 0.5s] 
+0

解決了,錯誤,不知道我怎麼錯過了。謝謝! – Cayle

回答

0

只是編碼的字符串轉換爲UTF-8。和你的問題將得到解決

html= soup.prettify() 
    html=html.encode('UTF-8')