我試圖從Python 3.3的網頁獲取文本,然後搜索特定字符串的文本。當我找到匹配的字符串時,我需要保存以下文本。例如,我拿這個頁面:http://gatherer.wizards.com/Pages/Card/Details.aspx?name=Dark%20Prophecy 我需要保存卡片信息中每個類別(卡片文本,稀有等)後的文本。 目前我使用美麗的湯,但get_text導致UnicodeEncodeError,並沒有返回一個可迭代的對象。下面是相關代碼:從網頁獲取文本作爲python 3.3中的可迭代對象
urlStr = urllib.request.urlopen(
'http://gatherer.wizards.com/Pages/Card/Details.aspx?name=' + cardName
).read()
htmlRaw = BeautifulSoup(urlStr)
htmlText = htmlRaw.get_text
for line in htmlText:
line = line.strip()
if "Converted Mana Cost:" in line:
cmc = line.next()
message += "*Converted Mana Cost: " + cmc +"* \n\n"
elif "Types:" in line:
type = line.next()
message += "*Type: " + type +"* \n\n"
elif "Card Text:" in line:
rulesText = line.next()
message += "*Rules Text: " + rulesText +"* \n\n"
elif "Flavor Text:" in line:
flavor = line.next()
message += "*Flavor Text: " + flavor +"* \n\n"
elif "Rarity:" in line:
rarity == line.next()
message += "*Rarity: " + rarity +"* \n\n"
請包括您從錯誤中獲得的完整回溯。 –
有很多更好的工具來處理HTML解析和刮擦比這個 –
@Guy所以爲什麼不命名一些? –