在python中訪問非樹形結構的xml數據

我有幾個我想在python中解析的xml文件。我知道python中的ElementTree包，但是我的xml文件沒有像結構一樣存儲在樹中。下面是一個例子在python中訪問非樹形結構的xml數據

<tag1 attribute1="at1" attribute2="at2">My files are text that I annotated with a tool 
to create these xml files.</tag1> 
Some parts of the text are enclosed in an xml tag, whereas others are not. 
<tag1 attribute1="at1" attribute2="at2"><tag2 attribute3="at3" attribute4="at4">Some 
are even enclosed in multiple tags.</tag1></tag2> 
And some have overlapping tags: 
<tag1 attribute1="at1" attribute2="at2">This is an example sentence 
<tag3 attribute5="at5">containing a nested example sentence</tag3></tag1>

每當我使用的ElementTree類的函數解析文件，我只能訪問的第一個標籤。我正在尋找一種解析所有標籤的方法，並且不需要像結構樹這樣的樹。任何幫助是極大的讚賞。

來源

2017-04-14 imc

如果您的示例是正確的，那是無效的XML。在第二種情況下，打開tag1，打開tag2，關閉tag1！有些庫嘗試猜測格式不正確的XML，但請首先確認您的示例是正確的。 – Javier

另外，發佈你如何嘗試當前訪問元素。 – Javier

按照定義，XML是格式良好的。這個標記不能用在像etree這樣的兼容XML庫中。現在，如果這一切都包裝在你沒有發佈的根標籤中，那麼它可能是有效的。 – Parfait

如果每行只有一個XML片段，則只需分別解析每行。

for line in some_file: 
    # parse using ET and getroot.

來源

2017-04-14 12:48:12 Javier

在python中訪問非樹形結構的xml數據

回答

相關問題