2017-06-06 113 views
0

我必須監視一整天運行的工具所寫的XML文件。但是XML文件只能在一天結束時正確完成並關閉。在編寫XML文件時(使用Python)

相同的約束作爲XML流處理:

  1. 解析上即時不完整的XML文件,並觸發動作
  2. 從一開始就再次保留在文件中軌道的最後一個位置,以避免處理它

Need to read XML files as a stream using BeautifulSoup in Python答案,slezica建議xml.saxxml.etree.ElementTreecElementTree。但是,我嘗試使用xml.etree.ElementTreecElementTree沒有成功。也有xml.dom,xml.parsers.expatlxml但我沒有看到支持「即時解析」

我需要更明顯的例子...

我目前正在使用Python 2.7在Linux上,但我會遷移到Python 3.x都有=>也請提供新的Python 3.x的功能提示。我還使用watchdog來檢測XML文件修改=>可以重複使用watchdog機制。也可以選擇支持Windows。

請提供易於理解/維護解決方案。如果它太複雜,我可能只使用tell()/seek()在文件中移動,在原始XML中使用愚蠢的文本搜索,最後使用基本正則表達式提取值。


XML示例:

<dfxml xmloutputversion='1.0'> 
    <creator version='1.0'> 
    <program>TCPFLOW</program> 
    <version>1.4.6</version> 
    </creator> 
    <configuration> 
    <fileobject> 
     <filename>file1</filename> 
     <filesize>288</filesize> 
     <tcpflow packets='12' srcport='1111' dstport='2222' family='2' /> 
    </fileobject> 
    <fileobject> 
     <filename>file2</filename> 
     <filesize>352</filesize> 
     <tcpflow packets='12' srcport='3333' dstport='4444' family='2' /> 
    </fileobject> 
    <fileobject> 
     <filename>file3</filename> 
     <filesize>456</filesize> 
     ... 
     ... 

首先測試使用SAX失敗:

import xml.sax 

class StreamHandler(xml.sax.handler.ContentHandler): 
    def startElement(self, name, attrs): 
     print 'start: name=', name 
    def endElement(self, name): 
     print 'end: name=', name 
     if name == 'root': 
      raise StopIteration 

if __name__ == '__main__': 
    parser = xml.sax.make_parser() 
    parser.setContentHandler(StreamHandler()) 
    with open('f.xml') as f: 
     parser.parse(f) 

外殼:

$ while read line; do echo $line; sleep 1; done <i.xml >f.xml & 
... 
$ ./test-using-sax.py 
start: name= dfxml 
start: name= creator 
start: name= program 
end: name= program 
start: name= version 
end: name= version 
Traceback (most recent call last): 
    File "./test-using-sax.py", line 17, in <module> 
    parser.parse(f) 
    File "/usr/lib64/python2.7/xml/sax/expatreader.py", line 107, in parse 
    xmlreader.IncrementalParser.parse(self, source) 
    File "/usr/lib64/python2.7/xml/sax/xmlreader.py", line 125, in parse 
    self.close() 
    File "/usr/lib64/python2.7/xml/sax/expatreader.py", line 220, in close 
    self.feed("", isFinal = 1) 
    File "/usr/lib64/python2.7/xml/sax/expatreader.py", line 214, in feed 
    self._err_handler.fatalError(exc) 
    File "/usr/lib64/python2.7/xml/sax/handler.py", line 38, in fatalError 
    raise exception 
xml.sax._exceptions.SAXParseException: report.xml:15:0: no element found 

回答

0

三小時在發佈我的問題後,沒有收到答覆。但是我終於實現了我正在尋找的簡單例子。

我的靈感來自saajanswer並且基於xml.saxwatchdog

from __future__ import print_function, division 
import time 
import watchdog.events 
import watchdog.observers 
import xml.sax 

class XmlStreamHandler(xml.sax.handler.ContentHandler): 
    def startElement(self, tag, attributes): 
    print(tag, 'attributes=', attributes.items()) 
    self.tag = tag 
    def characters(self, content): 
    print(self.tag, 'content=', content) 

class XmlFileEventHandler(watchdog.events.PatternMatchingEventHandler): 
    def __init__(self): 
    watchdog.events.PatternMatchingEventHandler.__init__(self, patterns=['*.xml']) 
    self.file = None 
    self.parser = xml.sax.make_parser() 
    self.parser.setContentHandler(XmlStreamHandler()) 
    def on_modified(self, event): 
    if not self.file: 
     self.file = open(event.src_path) 
    self.parser.feed(self.file.read()) 

if __name__ == '__main__': 
    observer = watchdog.observers.Observer() 
    event_handler = XmlFileEventHandler() 
    observer.schedule(event_handler, path='.') 
    try: 
    observer.start() 
    while True: 
     time.sleep(10) 
    finally: 
    observer.stop() 
    observer.join() 

當腳本運行時,不要忘記touch一個XML文件,或者使用下面的命令模擬上即時寫作:

while read line; do echo $line; sleep 1; done <in.xml >out.xml & 
1

從昨天開始,我發現了Peter Gibson「 s answer關於無證xml.etree.ElementTree.XMLTreeBuilder._parser.EndElementHandler

本示例與另一個示例類似,但使用xml.etree.ElementTree(和watchdog)。

ElementTreecElementTree取代它不工作: -/

import time 
import watchdog.events 
import watchdog.observers 
import xml.etree.ElementTree 

class XmlFileEventHandler(watchdog.events.PatternMatchingEventHandler): 
    def __init__(self): 
     watchdog.events.PatternMatchingEventHandler.__init__(self, patterns=['*.xml']) 
     self.xml_file = None 
     self.parser = xml.etree.ElementTree.XMLTreeBuilder() 
     def end_tag_event(tag): 
      node = self.parser._end(tag) 
      print 'tag=', tag, 'node=', node 
     self.parser._parser.EndElementHandler = end_tag_event 

    def on_modified(self, event): 
     if not self.xml_file: 
      self.xml_file = open(event.src_path) 
     buffer = self.xml_file.read() 
     if buffer: 
      self.parser.feed(buffer) 

if __name__ == '__main__': 
    observer = watchdog.observers.Observer() 
    event_handler = XmlFileEventHandler() 
    observer.schedule(event_handler, path='.') 
    try: 
     observer.start() 
     while True: 
      time.sleep(10) 
    finally: 
     observer.stop() 
     observer.join() 

當腳本運行時,不要忘記touch一個XML文件,或使用本模擬上即時寫作一個行腳本:

while read line; do echo $line; sleep 1; done <in.xml >out.xml & 

有關信息,該xml.etree.ElementTree.iterparse似乎並不支持寫入的文件。我的測試代碼:

from __future__ import print_function, division 
import xml.etree.ElementTree 

if __name__ == '__main__': 
    context = xml.etree.ElementTree.iterparse('f.xml', events=('end',)) 
    for action, elem in context: 
     print(action, elem.tag) 

我的輸出:

end program 
end version 
end creator 
end filename 
end filesize 
end tcpflow 
end fileobject 
end filename 
end filesize 
end tcpflow 
end fileobject 
end filename 
end filesize 
Traceback (most recent call last): 
    File "./iter.py", line 9, in <module> 
    for action, elem in context: 
    File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1281, in next 
    self._root = self._parser.close() 
    File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1654, in close 
    self._raiseerror(v) 
    File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror 
    raise err 
xml.etree.ElementTree.ParseError: no element found: line 20, column 0