使用帶有Xquery的字典註釋文本並打印整個結果

我是xquery的初學者，我希望你能以簡單的解釋幫助我。我正在使用BaseX 7.0.1。使用帶有Xquery的字典註釋文本並打印整個結果

我有一個dictionary.xml文件看起來像這樣：

<doc> 
    <entry> 
     <vedette>je</vedette> 
     <variante>je</variante> 
     <variante>j'</variante> 
     <partiedudiscours>pronom</partiedudiscours> 
    </entry> 
</doc>

而且我有一個包含我想註解的文字另一malone_fr.xml文件，看起來像這樣：

<doc> 
    L’Opportunité 
    Par : Walter Malone (1866-1915) 
    Ils ont mal conclu ceux qui disent que je ne reviendrai plus 
    Quand une fois j’ai frappé à ta porte et ne t’ai pas rencontré, 
</doc>

所以我想dictionary.xml的<變式>部分的內容與我的文字比較，並與< partiedudiscours內容標記文本>。到目前爲止，我已經能夠做到這一點，此代碼：

let $comp := data(for $j in tokenize(for $i in db:open('malone_fr')/doc return $i,"\n") 
return tokenize($j," ")) 
for $aa in $comp 
return 
for $lemme in db:open('dictionnaire')/doc/entry 
return 
let $oldName :=$aa 
return 
if ($oldName= $lemme/variante) 
then 
let $newName := element {$lemme/partiedudiscours} {$aa} 
return 
for $bb in $comp 
return 
if ($bb=$oldName) 
then $newName 
else ($bb) 
else()

這給了我下面的結果： [第一次迭代]

L’Opportunité Par : Walter Malone (1866-1915) Ils<verbe>ont</verbe> mal conclu ceux qui disent que je ne reviendrai plus

[第二次迭代]

L’Opportunité Par : Walter Malone (1866-1915) <pronom>Ils</pronom>ont mal conclu ceux qui disent que je ne reviendrai plus

正如你所看到的，它只是通過迭代顯示每個單詞的結果，而我需要一個結果與整個文本註釋如下：

L’Opportunité Par : Walter Malone (1866-1915) <pronom>Ils</pronom><verbe>ont</verbe> <adverbe>mal</adverbe> <verb>conclu</verb>

等等我不知道如何處理for-loop來做到這一點。

在此先感謝。

來源

2012-09-26 Umi

我認爲你的解決方案比它需要的複雜一點。你應該能夠在一個循環中做到這一點。使用XPath執行查找 - 而不是顯式循環查看字典中的所有值 - 將允許數據庫進行優化，以便更快地檢索字典數據。

let $toks := data(
    for $i in db:open('malone_fr')/doc 
    return tokenize($i,"\s")) 
for $t in $toks 
return 
    let $e := $dict/entry[variante = $t]  
    return 
     if ($e) 
     then (element { $e/partiedudiscours } { $t }, text{" "}) 
     else ($t, text{" "})

而且，tokenize()步驟會丟棄空格，因此輸出序列中不存在空格。它只會出現間隔，因爲這通常是渲染一系列原子類型的默認方法;但是，從您的測試輸出中可以看到，空間不會在元素周圍呈現。在上面的解決方案中，我添加了非常基本的空間處理，因此元素也可以正確分隔如果不需要，您可以刪除text{" "}節點。

更新：增加了@ DennisKnochenwefel的建議

來源

2012-09-26 15:56:07 wst

很好的解決方案。只是一個可能的改進：let $ toks：= data（tokenize（db：open（'malone_fr'）/ doc，「\ s」）） –

或更好（tokenize不能用於序列）：let $ toks：= data （對於我在db：open（'malone_fr'）/ doc return tokenize（$ i，「\ s」）） –

@DennisKnochenwefel謝謝，我更新瞭解決方案以包含您的建議。我對BaseX語法不太熟悉，所以出於謹慎我沒有弄亂標記化代碼。 – wst

使用帶有Xquery的字典註釋文本並打印整個結果

回答

相關問題