2012-06-11 34 views
0

我在學習Python,並且對XML解析器(ElementTree-XMLParser)行爲有一些困難的理解。Python XMLParser:什麼時候是data()方法調用

我修改的例子在documentation

class MaxDepth:      # The target object of the parser 
    path = "" 
    def start(self, tag, attrib): # Called for each opening tag. 
     self.path += "/"+ tag 
     print '>>> Entering - ' + self.path 
    def end(self, tag):    # Called for each closing tag. 
     print '<<< Leaving - ' + self.path 
     if self.path.endswith('/'+tag): 
      self.path = self.path[:-(len(tag)+1)] 
    def data(self, data): 
     if data: 
      print '... data called ...' 
      print data , 'length -' , len(data) 
    def close(self): # Called when all data has been parsed. 
     return self 

它輸出下面輸出

>>> Entering - /a 
... data called ... 

length - 1 
... data called ... 
    length - 2 
>>> Entering - /a/b 
... data called ... 

length - 1 
... data called ... 
    length - 2 
<<< Leaving - /a/b 
... data called ... 

length - 1 
... data called ... 
    length - 2 
>>> Entering - /a/b 
... data called ... 

length - 1 
... data called ... 
    length - 4 
>>> Entering - /a/b/c 
... data called ... 

length - 1 
... data called ... 
     length - 6 
>>> Entering - /a/b/c/d 
... data called ... 

length - 1 
... data called ... 
     length - 6 
<<< Leaving - /a/b/c/d 
... data called ... 

length - 1 
... data called ... 
    length - 4 
<<< Leaving - /a/b/c 
... data called ... 

length - 1 
... data called ... 
    length - 2 
<<< Leaving - /a/b 
... data called ... 

length - 1 
<<< Leaving - /a 
<__main__.MaxDepth instance at 0x10e7dd5a8> 

我的問題是

  1. 當是()方法調用的數據。
  2. 爲什麼在開始標記之前調用兩次
  3. 我無法找到api文檔以獲取有關data方法的更多詳細信息。我在哪裏可以找到類似XMLParser類的api參考javadoc。
+1

如果您的使用不需要事件解析,則使用'.parse()'http://www.doughellmann.com/PyMOTW/xml/etree/ElementTree/parse.html更容易。否則,他的事件示例可能會有所幫助:http://www.doughellmann.com/PyMOTW/xml/etree/ElementTree/parse.html#watching-events-while-parsing – ninMonkey

回答

2

如果你要修改數據的方法,像這樣:

def data(self, data): 
    if data: 
     print '... data called ...' 
     print repr(data), 'length -' , len(data) 

,你就會明白爲什麼有對數據的方法多次調用;它被稱爲爲標籤之間的文本每一行數據:

>>> Entering - /a 
... data called ... 
'\n' length - 1 
... data called ... 
' ' length - 2 
>>> Entering - /a/b 
... data called ... 
'\n' length - 1 
... data called ... 
' ' length - 2 
<<< Leaving - /a/b 
... data called ... 
'\n' length - 1 
... data called ... 
' ' length - 2 
>>> Entering - /a/b 
... data called ... 
'\n' length - 1 
... data called ... 
' ' length - 4 
# ... etc ... 

的XMLParser的方法是基於Expat解析器。

根據我的經驗,任何流式XML解析器都會將文本數據視爲一系列塊,並且必須將任何和所有數據事件連接在一起,直到您觸及下一個starttag或endtag事件。解析器經常在空白邊界處分塊,但這不是給定的。

相關問題