0
我如何可能以結構化的方式從文檔中檢索帶註釋的文本,如下所示。我正在使用一個句子作爲處理單元,這意味着我想從句子中檢索特定的文本,並在稍後將它們放在一起。因此,我已經在GATE中設置了我的註釋,並將註釋的結果保存爲內聯xml。從xml中從GATE提取註釋
所以我的XML輸入文件看起來像這樣:
<Document>
<Paragraph>
<text id="100">30.03. Zeraua joins the Otjimbingwe and Omaruru Ovaherero at Samuel’s station at Ongandjira in the upper Swakop valley.</text>
<text id="101">01.04. Von Glasenapp’s unit proceeds in the direction of Otjikuoko without meeting the Tjetjo community.</text>
<text id="102">09.04. The battle of Ongandjira is fought with heavy losses on both sides. The Ovaherero have to give way before a sustained German artillery bombardment commences, and they escape in a northerly direction.</text>
</Paragraph>
<Paragraph>
<text id="200">30.03. Zeraua joins the Otjimbingwe and Omaruru Ovaherero at Samuel’s station at Ongandjira in the upper Swakop valley.</text>
<text id="201">01.04. Von Glasenapp’s unit proceeds in the direction of Otjikuoko without meeting the Tjetjo community.</text>
<text id="202">09.04. The battle of Ongandjira is fought with heavy losses on both sides. The Ovaherero have to give way before a sustained German artillery bombardment commences, and they escape in a northerly direction.</text>
</Paragraph>
</Document>
這是每一句話我期望的輸出結構爲:
<text id="100">
<Event>Battle of Ongandjira</Event>
<Location>Ongandjira</Location>
<NumberDate>30.03</NumberDate>
<Person>Zeraua</Person>
</text>
這是我在GATE註釋:
我的內嵌文件只是包含很多混合註釋和我無法弄清楚如何按順序構造它。我已經嘗試了Format_Twitter JSON,它也是一團糟。
非常感謝。
噢,是我想通了這一點。但問題是我甚至不能解釋來自GATE的xml。它不再包含句子,它只是有節點。 –