如何將Unicode導致BeautifulSoup

我試圖讓程序使用BeautifulSoup提交任何單詞到（http://upodn.com/phon.php）然後打印結果。例如當我提交了「你好」字（http://upodn.com/phon.php）網站上的結果是：həlo 但是當我用我的腳本提交「你好」字它的結果是：həlo如何將Unicode導致BeautifulSoup

我怎樣才能打印結果因爲它出現在網站=>həlo？

腳本：

# -*- coding: utf-8 -*- 

import mechanize 
import cookielib 
from BeautifulSoup import BeautifulSoup 
import html2text 

br = mechanize.Browser() 
cj = cookielib.LWPCookieJar() 
br.set_cookiejar(cj) 
br.set_handle_equiv(True) 
br.set_handle_redirect(True) 
br.set_handle_referer(True) 
br.set_handle_robots(False) 
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1) 
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1'), ('Content-type', 'text/html; charset=utf-8')] 
br.open('http://upodn.com/phon.php') 
br.select_form(nr=0) 
br.form['intext'] = 'hello' 
br.submit() 
data = br.response().read() 
soup = BeautifulSoup(data) 
# print soup 
table = soup.find('table', {'rules': 'cols'}) 
result = [] 
for row in table.findAll("font"): 
    d = row.text 
    result.append(d) 
print result[1]

輸出：

h&#x0259;lo 
[Finished in 2.7s]

來源

2016-03-21 Magic Coding

首先，您使用的是BeautifulSoup的過時版本;目前的版本是包和模塊'bs4' –

您使用BeautifulSoup，BeautifulSoup 3.當前版本的完全過時版本，BeautifulSoup 4中的PyPI稱爲beautifulsoup4並具有頂級級別包bs4。 BeautifulSoup 4解碼這些HTML實體：

Python 2.7.10 (default, Oct 14 2015, 16:09:02) 
[GCC 5.2.1 20151010] on linux2 
Type "help", "copyright", "credits" or "license" for more information. 
>>> from bs4 import BeautifulSoup 
>>> print(BeautifulSoup('<b>h&#x0259;lo</b>').find('b').text) 
həlo

有一個在編寫使用BeautifulSoup3新的代碼沒有意義的，所以你現在應該切換。

來源

2016-03-21 10:45:05

完美的作品，謝謝:) –

如何將Unicode導致BeautifulSoup

回答

相關問題