2
我一直試圖通過在python中使用lxml和請求包來解析xml和html頁面。我使用下面的代碼用於此目的:用lxml解析xml和html頁面並在python中請求包
在python:
import requests
import lxml.etree
url = ""
req = requests.get(url)
tree = html.fromstring(req.content)
root = tree.xpath('')
for item in root:
print(item.text)
此代碼工作正常,但對於某些網頁無法正常顯示的內容和需要設置編碼UTF-8,但我不不知道如何在此代碼中添加集編碼
我嘗試tree = html.fromstring(req.text),但帶有編碼聲明的Unicode字符串不受lxml支持,並且不起作用 – Snaicher
適合我。我嘗試了ISO-8859-1,UTF-8頁面。你使用哪個lxml版本?你能提供一個你請求頁面的鏈接嗎? –
url =「http://asretebar.com/rss/feed/?c=1&m=6」 req = requests.get(url) #req.encoding =「utf-8」 #req.content.decode (req.encoding) tree = html.fromstring(req.text) root = tree.xpath('channel/item/title') for root in root: print(item.text) – Snaicher