在Python中保留轉義字符XML解析

我正在嘗試編寫一個基於輸入文件內容的一個或兩個xml文件並輸出一個或兩個新文件的python腳本。我試圖用minidom模塊編寫這個腳本。但是，輸入文件包含了一些轉義字符在Python中保留轉義字符XML解析

節點屬性中的實例。不幸的是，在輸出文件中，這些字符已被轉換爲不同的字符，這似乎是換行符。

例如，在輸入文件中，諸如線：

<Entry text="For English For Hearing Impaired&#xa;Press 3 on Keypad"

將被輸出作爲

<Entry text="For English For Hearing Impaired 
Press 3 on Keypad"

我讀minidom被造成這一點，因爲它不允許轉義字符在xml屬性（我認爲）。這是真的？而且，如果是這樣，用什麼最好的工具/方法來將xml文件解析爲python文檔，操作節點並與其他文檔交換，並將文檔輸出回新文件？

如果有幫助，我還使用'utf-8'編碼解析並保存這些文件。我不知道這是否是問題的一部分。感謝任何人的幫助。

-Alex凱澤

來源

2010-10-28 Pyrobug

我還沒有發現因爲使用lxml Python的標準XML模塊。它可以做你想要的一切。例如...

的input.xml：

<?xml version="1.0" encoding='utf-8'?> 
<root> 
    <Button3 yposition="250" fontsize="16" language1="For English For Hearing Impaired&#xa;Press 3 on Keypad" /> 
</root>

和：

>>> from lxml import etree 
>>> with open('input.xml') as f: 
...  root = etree.parse(f) 
... 
>>> buttons = root.xpath('//Button3') 
>>> buttons 
[<Element Button3 at 101071f18>] 
>>> buttons[0] 
<Element Button3 at 101071f18> 
>>> buttons[0].attrib 
{'yposition': '250', 'language1': 'For English For Hearing Impaired\nPress 3 on Keypad', 'fontsize': '16'} 
>>> buttons[0].attrib['foo'] = 'bar' 
>>> s = etree.tostring(root, xml_declaration=True, encoding='utf-8', pretty_print=True) 
>>> print(s) 
<?xml version='1.0' encoding='utf-8'?> 
<root> 
    <Button3 yposition="250" fontsize="16" language1="For English For Hearing Impaired&#10;Press 3 on Keypad" foo="bar"/> 
</root> 
>>> with open('output.xml','w') as f: 
...  f.write(s) 
>>>

來源

2010-10-28 01:46:33 snapshoe


是性格0X0A，或換行的XML實體。解析器正確解析XML並給出所指示的字符。如果您想禁止或以其他方式處理屬性中的換行符，那麼在解析器給予您之後，您可以隨意做任何您喜歡的事情。

來源

2010-10-28 02:51:33

不幸的是，標準xml模塊沒有關閉轉義的選項。所以，對我來說最好的選擇是使用方法escape it back從ElementTree所使用的xml本身用於此目的（從sax.utils方法不逃避\n）：

text = ElementTree._escape_attrib(text, 'utf-8')

文字在源XML：

Here is a test message&#10;With newline &amp; ampersand

經過「解碼」

文字：「逃跑回」後

Here is a test message 
With newline & ampersand

文字：

Here is a test message&#10;With newline &amp; ampersand

來源

2016-05-18 07:30:22 Jimilian

在Python中保留轉義字符XML解析

回答

相關問題