Python的HTMLParser的

我解析使用HTMLParser的一個HTML文件，我想打印的開始和AP標記的結束Python的HTMLParser的

看到我的代碼片段

def handle_starttag(self, tag, attrs): 
     if tag == 'p': 
      print "TODO: print the contents"

任何幫助將是非常讚賞的內容

露絲

來源

2011-08-26 Ruth

我延長從docs的例子：

from HTMLParser import HTMLParser 

class MyHTMLParser(HTMLParser): 

    def handle_starttag(self, tag, attrs): 
     print "Encountered the beginning of a %s tag" % tag 

    def handle_endtag(self, tag): 
     print "Encountered the end of a %s tag" % tag 

    def handle_data(self, data): 
     print "Encountered data %s" % data 

p = MyHTMLParser() 
p.feed('<p>test</p>')

Encountered the beginning of a p tag 
Encountered data test 
Encountered the end of a p tag

來源

2011-08-26 11:45:00 tauran

http://docs.python.org/library/htmlparser.html#example-html-parser-application的不錯的使用 –

基於什麼@tauran貼，你可能想要做這樣的事情：

from HTMLParser import HTMLParser 

class MyHTMLParser(HTMLParser): 
    def print_p_contents(self, html): 
     self.tag_stack = [] 
     self.feed(html) 

    def handle_starttag(self, tag, attrs): 
     self.tag_stack.append(tag.lower()) 

    def handle_endtag(self, tag): 
     self.tag_stack.pop() 

    def handle_data(self, data): 
     if self.tag_stack[-1] == 'p': 
      print data 

p = MyHTMLParser() 
p.print_p_contents('<p>test</p>')

現在，你可能想都<p>內容推到一個列表和作爲結果返回或其他類似的東西。

TIL：當像這樣的圖書館工作時，你需要考慮在堆棧！

來源

2011-08-26 11:51:19

你忘了叫'feed'，並列出有'append'不'推'。 – tauran

感謝tauran，已更新！ –

在一個大的HTML文件我得到一個'如果self.tag_stack [-1] =='P'： IndexError：列表索引超出範圍' –

它似乎沒有爲我的代碼工作，所以我定義tag_stack = []外面像一種全局變量。

from html.parser import HTMLParser 
    tag_stack = [] 
    class MONanalyseur(HTMLParser): 

    def handle_starttag(self, tag, attrs): 
     tag_stack.append(tag.lower()) 
    def handle_endtag(self, tag): 
     tag_stack.pop() 
    def handle_data(self, data): 
     if tag_stack[-1] == 'head': 
      print(data) 

parser=MONanalyseur() 
parser.feed()

來源

2015-07-08 17:08:32 nate

Python的HTMLParser的

回答

相關問題