如何在Python中提取數據時獲取unicode字符串？

我想從越南的網站中提取文本，該文件的字符集是utf-8。然而，我得到的文本總是在Ascii中，我無法找到一種方法將它們轉換爲unicode或獲取網站上的文本。因此，我無法按預期將它們保存到文件中。
我知道這是在Python中unicode非常流行的問題，但我仍然希望有人能幫我弄明白。謝謝。
我的代碼：
如何在Python中提取數據時獲取unicode字符串？

import requests, re, io 
import simplejson as json 
from lxml import html, etree 

base = "http://www.amthuc365.vn/cong-thuc/" 
page = requests.get(base + "trang-" + str(1) + ".html") 
pageTree = html.fromstring(page.text) 

links = pageTree.xpath('//ul[contains(@class, "mt30")]/li/a/@href') 
names = pageTree.xpath('//h3[@class="title"]/a/text()') 
for name in names[:1]: 
    print name 
    # LÃ m bÃ¡nh oreo nhÃ¢n bÆ¡ Äáºu phá»ng thÆ¡m bÃ¹i

，但我需要的是「林迪班奧利奧仁博đậu海防THOM BUI」
感謝。

來源

2015-09-20 Huy Do

只需從page.text切換到page.content應該使其工作。

說明here。

另見：

來源

2015-09-20 02:15:01 alecxe

非常感謝您@alecxe –

如何在Python中提取數據時獲取unicode字符串？

回答

相關問題