用字1和NOT字2

下面的XQuery全文檢索是XML結構 -用字1和NOT字2

<Docs> 
    <Doc> 
    <Name>Doc 1</Name> 
    <Notes> 
     <specialNote> 
      This is a special note section. 
      <B>This B Tag is used for highlighting any text and is optional</B>   
      <U>This U Tag will underline any text and is optional</U>   
      <I>This I Tag is used for highlighting any text and is optional</I>   
     </specialNote>  
     <generalNote> 
      <P> 
      This will store the general notes and might have number of paragraphs. This is para no 1. NO Child Tags here   
      </P> 
      <P> 
      This is para no 2    
      </P> 
     </generalNote>  
    </Notes> 
    <Desc> 
     <P> 
      This is used for Description and might have number of paragraphs. Here too, there will be B, U and I Tags for highlighting the description text and are optional 
      <B>Bold</B> 
      <I>Italic</I> 
      <U>Underline</U> 
     </P> 
     <P> 
      This is description para no 2 with I and U Tags 
      <I>Italic</I> 
      <U>Underline</U> 
     </P>  
    </Desc> 
</Doc>

將有1000 Doc標籤的。我想給用戶一個搜索條件，他可以在那裏搜索WORD1而不是WORD2。以下是查詢 -

for $x in doc('Documents')/Docs/Doc[Notes/specialNote/text() contains text 'Tom' 
ftand ftnot 'jerry' or 
Notes/specialNote/text() contains text 'Tom' ftand ftnot 'jerry' or 
Notes/specialNote/B/text() contains text 'Tom' ftand ftnot 'jerry' or 
Notes/specialNote/I/text() contains text 'Tom' ftand ftnot 'jerry' or 
Notes/specialNote/U/text() contains text 'Tom' ftand ftnot 'jerry' or 
Notes/generalNote/P/text() contains text 'Tom' ftand ftnot 'jerry' or 
Desc/P/text() contains text 'Tom' ftand ftnot 'jerry' or 
Desc/P/B/text() contains text 'Tom' ftand ftnot 'jerry' or 
Desc/P/I/text() contains text 'Tom' ftand ftnot 'jerry' or 
Desc/P/U/text() contains text 'Tom' ftand ftnot 'jerry'] 
return $x/Name

此查詢結果是錯誤的。我的意思是，結果包含Tom和jerry的一些文檔。所以我改變了查詢到 -

for $x in doc('Documents')/Docs/Doc[. contains text 'Tom' ftand ftnot 'jerry'] 
return $x/Name

這個查詢給我的確切結果，即;只有那些Tom和不是jerry的文檔，但是會花費巨大的時間......大約。 45秒，而較早的則需要10秒！

我正在使用BaseX 7.5 XML數據庫。

需要專家的意見對這個:)

來源

2013-02-19 John

第一個查詢單獨測試文檔中的每個文本節點，所以Tom and Jerry會匹配，因爲第一個文本節點包含湯姆但不傑裏。

在第二個查詢中，對Doc元素的所有文本內容都進行全文搜索，就好像它們連接成一個字符串一樣。這不能（當前）通過BaseX's fulltext index來回答，其分別索引每個文本節點。

一個解決方案是分別對每個術語執行全文搜索並最終合併結果。這可以爲每個文本節點來完成分開，所以可以使用的指數：

for $x in (doc('Documents')/Docs/Doc[.//text() contains text 'Tom'] 
      except doc('Documents')/Docs/Doc[.//text() contains text 'Jerry']) 
return $x/Name

上述查詢是由查詢優化器重寫，以使用兩個索引該等效一個訪問：

for $x in (db:fulltext("Documents", "Tom")/ancestor::*:Doc 
      except db:fulltext("Documents", "Jerry")/ancestor::*:Doc) 
return $x/Name

您甚至可以調整合並結果的順序，以便在需要時保持中間結果較小。

來源

2013-02-19 11:24:39

回答

相關問題