忽略的XML元素顯示在eXist-db的lucene搜索結果附近

我正在用eXist-db構建一個應用程序，它使用TEI文件並將它們轉換爲html。忽略的XML元素顯示在eXist-db的lucene搜索結果附近

對於搜索功能，我將lucene配置爲忽略某些標記。

<collection xmlns="http://exist-db.org/collection-config/1.0" xmlns:teins="http://www.tei-c.org/ns/1.0"> 
    <index xmlns:xs="http://www.w3.org/2001/XMLSchema"> 

     <fulltext default="none" attributes="false"/> 

     <lucene> 
     <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/> 
     <analyzer id="ws" class="org.apache.lucene.analysis.WhitespaceAnalyzer"/> 
      <text match="//teins:TEI"> 

       <inline qname="p"/> 
       <inline qname="text"/> 

       <ignore qname="teins:del"/> 
       <ignore qname="teins:sic"/> 
       <ignore qname="teins:index"/> 
       <ignore qname="teins:term"/> 
       <ignore qname="teins:note"/> 

      </text> 
     </lucene> 


    </index> 
</collection>

嗯，還挺作品出來，這些元素不會在搜索結果中直接顯示出來，但在之前和匹配的文本後的片段，這是由KWIC模塊返回。有沒有辦法在編制索引之前刪除它們或應用XSL轉換？

例如TEI：

...daß er sie zu entwerten sucht. Wie 
        <index> 
         <term>Liebe</term> 
         <index> 
          <term>und Hass</term> 
         </index> 
        </index> 
Liebe Ausströmung inneren Wertes ist,...

當我搜索「Ausströmung」，查詢結果爲

....sucht. Wie Liebe und Hass Liebe Ausströmung  inneren Wertes ist,...

但應導致成

....sucht. Wie Liebe Ausströmung  inneren Wertes ist,...

當我搜索「哈斯「這段文字片段並未顯示在結果中。

對於搜索功能：我嚴格遵守文檔中的莎士比亞例子。

來源

2014-01-18 romedius

讓我們從eXist-db的莎士比亞應用程序中獲取出發點。假設你有索引條目。您不希望索引中的命中 - 索引配置需要照顧 - 但您也不希望它們輸出到KWIC顯示器 - 您必須自行編碼。

如果你看看app.xql，你會看到有一個名爲app：filter的函數：app called：show-hits。這可以用來根據輸出的文本節點的父節點的名稱，將部分輸出移除到KWIC顯示。

這會給你想要的東西：

declare %private function app:filter($node as node(), $mode as xs:string) as xs:string? { 
    let $ignored-elements := doc('/db/system/config/db/apps/shakespeare/collection.xconf')//*:ignore/@qname/string() 
    let $ignored-elements := 
     for $ignored-element in $ignored-elements 
     let $ignored-element := substring-after($ignored-element, ':') 
     return $ignored-element 
    return 
     if (local-name($node/parent::*) = ('speaker', 'stage', 'head', $ignored-elements)) 
     then() 
     else 
      if ($mode eq 'before') 
      then concat($node, ' ') 
      else concat(' ', $node) 
};

你當然可以硬編碼的元素忽略，如('speaker', 'stage', 'head', 'sic', 'term', 'note')（「指數」在這裏沒有必要的，因爲你必須經常使用「術語」），但我想表明你不必這樣做。但是，如果你沒有對要忽略的元素進行硬編碼，你當然應該將$ ignored-elements的賦值移出函數，例如，將其賦值給查詢序言中聲明的變量，這樣數據庫（collection.xconf）就會不會爲所遇到的每個文本節點調用：這真的很愚蠢，但爲了簡單起見，我已將所有功能放在一個函數中。 PS：命名空間前綴可以是你選擇的任何東西，但http://www.tei-c.org/ns/1.0命名空間的標準命名空間前綴是「tei」，並且將其改爲「teins」只會導致混淆。

來源

2014-01-18 10:47:15

謝謝，這解決了我的問題。目前我正在從5月份開始安裝Verion，所以過濾功能看起來有點不同。最後一件事：是否有可能以動態方式檢索'/db/system/config/db/apps/shakespeare/collection.xconf'？如果我將應用程序移動到另一個文件夾，路徑也會改變。我已經改變了這 DOC（FN：CONCAT（ '/ DB /系統/配置'，$配置：應用程序根， '/collection.xconf'））但是這看起來非常凌亂和醜陋。是否有更好的解決方案來訪問應用程序根目錄的集合？ – romedius

如果你看到這樣混亂和醜陋，你最好開始習慣它 - 這是一個好的應用程序是如何構建的。我爲一個發現它美麗。 - 請問您在問題標題中將「忽略的XML屬性」更正爲「忽略的XML元素」？ - 你是否在查詢序言中聲明並綁定了$ ignored-elements？ –

忽略的XML元素顯示在eXist-db的lucene搜索結果附近

回答

相關問題