pythonic方法來修復破損的xml

我正在使用一個破損的XML-RPC服務器，雖然我已經提交了一個支持請求來修復它，但是有一個錯誤報告utf-8響應的字節長度爲字符數，導致截取我正在接收的XML。pythonic方法來修復破損的xml

我預計這個問題很快就會被解決，但是我目前正在研究這個工具，並且真的需要讓它工作。目前，我有猴子補丁xmlrpclib來反向解析異常，並允許我手動爲解析器提供修正版本的響應，但考慮到XML的本質，必須有一種方法以編程方式執行此操作，因此允許我使用XML-RPC服務器就好像它沒有這個bug一樣。

截斷的數量只是結束標記的一部分，所以如果有一種內置的方式來獲取破壞的xml樹，請將其轉換爲所有標記關閉，然後解析它，這將允許我得到與我的工作 - 我目前正在滾動我自己的，但任何幫助將不勝感激，我無法想象我是第一個想要在xml上執行錯誤更正的人，但如果我不' t找到一個預先解決的解決方案，我會推我的git並從這裏鏈接它。

來源

2012-05-15 theheadofabroom

下面是一個快速片段 - 關鍵在於薩克斯解析器在事件發生的過程中生成事件，因此它們允許您處理內容直至其中斷點。

#!/usr/bin/env python 

import sys 
from xml.sax import handler, make_parser 

class TagHandler(handler.ContentHandler): 
    def __init__(self): 
     handler.ContentHandler.__init__(self) 

     self.stack = [] 


    def startElement(self, name, attrs): 
     self.stack.append(name) 

    def endElement(self, name): 
     # TODO: might want to just confirm that the element matches the top of the stack here 
     self.stack.pop() 


    def finish_document(self): 
     return "\n".join(["</%s>" % tag for tag in reversed(self.stack)]) 


parser = make_parser() 
handler = TagHandler() 
parser.setContentHandler(handler) 

try: 
    parser.parse(sys.argv[1]) 

except: 
    # TODO: something more intelligent than just printing out the 
    # constructed end of the document. Like appending it to the source 
    # and repeating whatever you did to make this processing necessary. 
    print handler.finish_document()

來源

2012-05-15 16:04:09

pythonic方法來修復破損的xml

回答

相關問題