解析XML文件與有序字典

我有一個xml文件的格式爲：解析XML文件與有序字典

<NewDataSet> 
    <Root> 
     <Phonemic>and</Phonemic> 
     <Phonetic>nd</Phonetic> 
     <Description/> 
     <Start>0</Start> 
     <End>8262</End> 
    </Root> 
    <Root> 
     <Phonemic>comfortable</Phonemic> 
     <Phonetic>comfetebl</Phonetic> 
     <Description>adj</Description> 
     <Start>61404</Start> 
     <End>72624</End> 
    </Root> 
</NewDataSet>

我需要處理它，這樣，例如，當用戶輸入nd，程序與匹配時， <Phonetic>標記，並從<Phonemic>部分返回and。我想也許如果我可以將XML文件轉換爲字典，我將能夠遍歷數據並在需要時查找信息。

我搜查，發現xmltodict這是用於同一目的：

import xmltodict 
with open(r'path\to\1.xml', encoding='utf-8', errors='ignore') as fd: 
    obj = xmltodict.parse(fd.read())

運行這給了我一個ordered dict：

>>> obj 
OrderedDict([('NewDataSet', OrderedDict([('Root', [OrderedDict([('Phonemic', 'and'), ('Phonetic', 'nd'), ('Description', None), ('Start', '0'), ('End', '8262')]), OrderedDict([('Phonemic', 'comfortable'), ('Phonetic', 'comfetebl'), ('Description', 'adj'), ('Start', '61404'), ('End', '72624')])])]))])

現在這個不幸的是還沒有把事情簡單，我不知道如何去執行新的數據結構的程序。例如訪問nd我不得不寫：

obj['NewDataSet']['Root'][0]['Phonetic']

這是可笑的複雜。我試圖通過dict()將它變成一個普通的字典，但是因爲它是嵌套的，所以內層仍然是有序的，而且我的數據非常大。

來源

2014-11-14 Omid

如何轉換爲常規字典有什麼區別？你將仍然擁有儘可能多的密鑰層。什麼*確切*是問題;你不喜歡'OrderedDict .__ repr__'嗎？ – jonrsharpe

如果您以obj['NewDataSet']['Root'][0]['Phonetic']的身份登錄IMO，那麼您的行爲並不正確。

相反，你可以做以下

obj = obj["NewDataSet"] 
root_elements = obj["Root"] if type(obj) == OrderedDict else [obj["Root"]] 
# Above step ensures that root_elements is always a list 
for element in root_elements: 
    print element["Phonetic"]

即使這個代碼看起來更加長，優點是，這將是很多更加緊湊和模塊化的，一旦你開始處理足夠大的XML。附：

PS：我和xmltodict有同樣的問題。但是代替使用xml.etree.ElementTree解析xml文件的解析，xmltodict更容易處理，因爲代碼庫更小，並且我不必處理xml模塊的其他缺點。

編輯

下面的代碼工作對我來說

import xmltodict 
from collections import OrderedDict 

xmldata = """<NewDataSet> 
    <Root> 
     <Phonemic>and</Phonemic> 
     <Phonetic>nd</Phonetic> 
     <Description/> 
     <Start>0</Start> 
     <End>8262</End> 
    </Root> 
    <Root> 
     <Phonemic>comfortable</Phonemic> 
     <Phonetic>comfetebl</Phonetic> 
     <Description>adj</Description> 
     <Start>61404</Start> 
     <End>72624</End> 
    </Root> 
</NewDataSet>""" 

obj = xmltodict.parse(xmldata) 
obj = obj["NewDataSet"] 
root_elements = obj["Root"] if type(obj) == OrderedDict else [obj["Root"]] 
# Above step ensures that root_elements is always a list 
for element in root_elements: 
    print element["Phonetic"]

來源

2014-11-14 09:16:04

謝謝。我認爲最後一行應該是'print element [0] ['Phonemic']'否則它會抱怨索引應該是整數而不是'str'。 – Omid

@ novice66不會，因爲我使用的for循環會導致索引被照顧。你在嘗試代碼時遇到任何問題嗎？ –

我剛剛運行它（在Python 3中，在'print'周圍添加了括號），並且出現錯誤：'TypeError：列表索引必須是整數，而不是str' – Omid

慕的回答爲我工作，我不得不改變的唯一的事情是棘手確保root_element始終是一個列表一步： -

import xmltodict 
from collections import OrderedDict 

xmldata = """<NewDataSet> 
    <Root> 
     <Phonemic>and</Phonemic> 
     <Phonetic>nd</Phonetic> 
     <Description/> 
     <Start>0</Start> 
     <End>8262</End> 
    </Root> 
    <Root> 
     <Phonemic>comfortable</Phonemic> 
     <Phonetic>comfetebl</Phonetic> 
     <Description>adj</Description> 
     <Start>61404</Start> 
     <End>72624</End> 
    </Root> 
</NewDataSet>""" 

obj = xmltodict.parse(xmldata) 
obj = obj["NewDataSet"] 
root_elements = obj["Root"] if type(obj["Root"]) == list else [obj["Root"]] 
# Above step ensures that root_elements is always a list 
# Is obj["Root"] a list already, then use obj["Root"], otherwise make single element list. 
for element in root_elements: 
    print element["Phonetic"]

來源

2016-12-03 14:14:09

解析XML文件與有序字典

回答

相關問題