2014-01-27 35 views
1

林試圖從網頁上的文字與Python 3.3,然後通過文字某些字符串搜索。當我找到匹配的字符串時,我需要保存以下文本。例如,我拿這個頁面:http://gatherer.wizards.com/Pages/Card/Details.aspx?name=Dark%20Prophecy 我需要保存卡片信息中每個類別(卡片文本,稀有等)後的文本。 目前我使用美麗的湯,但get_text導致UnicodeEncodeError,並沒有返回一個可迭代的對象。下面是相關代碼:HTML解析文本在Python 3

   urlStr = urllib.request.urlopen('http://gatherer.wizards.com/Pages/Card/Details.aspx?name=' + cardName).read() 

       htmlRaw = BeautifulSoup(urlStr) 

       htmlText = htmlRaw.get_text 

       for line in htmlText: 
        line = line.strip() 
        if "Converted Mana Cost:" in line: 
         cmc = line.next() 
         message += "*Converted Mana Cost: " + cmc +"* \n\n" 
        elif "Types:" in line: 
         type = line.next() 
         message += "*Type: " + type +"* \n\n" 
        elif "Card Text:" in line: 
         rulesText = line.next() 
         message += "*Rules Text: " + rulesText +"* \n\n" 
        elif "Flavor Text:" in line: 
         flavor = line.next() 
         message += "*Flavor Text: " + flavor +"* \n\n" 
        elif "Rarity:" in line: 
         rarity == line.next() 
         message += "*Rarity: " + rarity +"* \n\n" 

回答

1

考慮使用lxml and xpath,而不是,您將能夠做這樣的事情:

>>> from lxml import html 
>>> root = html.parse("http://gatherer.wizards.com/Pages/Card/Details.aspx?name=Dark%20Prophecy") 
>>> root.xpath('//div[contains(text(), "Flavor Text")]/following-sibling::div/div/i/text()') 
['When the bog ran short on small animals, Ekri turned to the surrounding farmlands.'] 
+0

如何在Windows上安裝呢?網站上的說明似乎只是說下載它,但這並不奏效 – CrazyBurrito