在我的項目上使用XML解析器,但無法處理一個問題。python中的Xml解析器沒有刪除標籤
這是我擁有的XML文件。我對幾個要素感興趣:句子,句子的確定性和安全性。
由於所需的輸出我想: 確定性,這是肯定的或不確定的 ccue,這是內部的標籤,並 全句(用ccues - 包括或不包括)。
我做了什麼: 進口xml.etree.ElementTree作爲ET
with open('myfile.xml', 'rt') as f:
tree = ET.parse(f)
for sentence in tree.iter('sentence'):
certainty = sentence.attrib.get('certainty')
ccue = sentence.find('ccue')
if certainty and (ccue is not None):
print(' %s :: %s :: %s' % (certainty, sentence.text, ccue.text))
else:
print(' %s ::,:: %s' % (certainty,sentence.text))
但在這種情況下,ccues從句子移除,如果句子是不確定的比它是不完整的。查找功能只要找到了ccue就會停止。所以如果這句話是:
<sentence certainty="uncertain" id="S1867.3">However, the <ccue>majority of Israelis</ccue> find a comprehensive right of return for Palestinian refugees to be unacceptable.</sentence>
它會告訴我:「然而,」作爲一個句子。
任何人都可以幫我解決問題嗎?而且你也可以幫我將結果保存爲CSV格式 - 這會很棒。
修訂 XML的示例:
<sentence certainty="certain" id="S1867.2">Left-wing Israelis are open to compromise on the issue, by means such as the monetary reparations and family reunification initiatives offered by Ehud Barak at the Camp David 2000 summit.</sentence>
<sentence certainty="uncertain" id="S1867.3">However, the <ccue>majority of Israelis</ccue> find a comprehensive right of return for Palestinian refugees to be unacceptable.</sentence>
<sentence certainty="certain" id="S1867.4">The HonestReporting organization listed the following grounds for this opposition: Palestinian flight from Israel was not compelled, but voluntary.</sentence>
<sentence certainty="uncertain" id="S1867.5">After seven Arab nations declared war on Israel in 1948, <ccue>many Arab leaders</ccue> encouraged Palestinians to flee, in order to make it easier to rout the Jewish state.</sentence>
<sentence certainty="certain" id="S1867.6">This point, however, is a matter of some contention.</sentence>
請發佈xml示例...不是相同的圖片。 – tdelaney
@tdelaney添加到主要消息 – ZverArt
你想要整個句子,包括在ccue的東西?這是'''.join(sentence.itertext())'。順便說一句 - 發佈較短的文本(以及更少的政治文本)會有所幫助。 – tdelaney