在python 3.5中解析html會返回奇怪的類型

我正在運行python 3.5，並試圖從此網頁中提取BINGO數據，並遇到一些問題。當我拆分html響應時，我不斷收到字母b之前的字符串列表，並使其無法檢查。我檢查了我不熟悉的html輸出及其類字節。爲什麼這個b在我所有的字符串之前，第二我怎麼能更乾淨地解析一個html頁面。在python 3.5中解析html會返回奇怪的類型

import urllib.request 
with urllib.request.urlopen('http://www.executiveadministrator.com/cgi-local/inoutPROhosted4/inoutPRO.pl?refresh=1&ID=AFTCO') as response: 
    html = response.read() 

htmllist = html.split() 

print(htmllist) 
for i in htmllist: 
    #if i == 'BINGO': 
    print(i)

示例輸出：b'class = 「colorlinkbody」>續訂 'b'Board' b'Contract
'b'Copyright' b'1996-2013' B ''

來源

2017-02-24 M4dW0r1d

因爲response.read返回'字節'不再'str'。使用'encode（）' –

由於response.read()返回bytes不再像註釋中提到的str一樣，如果您需要從字節對象獲取字符串值，則必須調用字節對象的decode(encoding)方法。使您的打印功能：

for i in htmllist: 
    print(i.decode('utf-8'))

來源

2017-02-24 15:04:33 metame

感謝這似乎是一個笨重的方式從html中獲取字符串列表。有沒有更好的辦法？意思是其他urllib.request？如果有問題，我在Windows平臺上。 – M4dW0r1d

取決於你想要對他們做什麼，但你應該更多地看看HTML解析庫如'lxml'或'BeautifulSoup'又名'bs4' – metame

在python 3.5中解析html會返回奇怪的類型

回答

相關問題