Elasticsearch - 檢查使用同義詞的查詢中是否包含文檔

我想構建一個應用程序，其中匹配要求文檔中的每個標記至少包含在查詢中一次!!!Elasticsearch - 檢查使用同義詞的查詢中是否包含文檔

請注意它的方式比標準的期望。所以文件現在相當小，而查詢可能會很長。例如：

文件：

"elastic super cool".

有效查詢的比賽將是

"I like elastic things since elasticsearch is super cool"

我設法從彈性搜索匹配的令牌的數量（見https://groups.google.com/forum/?fromgroups=#!topic/elasticsearch/ttJTE52hXf8）。所以在上面的例子中，3個匹配（=文檔的長度）意味着查詢匹配。

但是，我怎麼能結合這與同義詞???

假設「酷」的同義詞是「好」，「好」和「好」。通過使用同義詞標記過濾器，我設法將同義詞添加到文檔中的每個位置。

因此，以下四個文件各有3令牌上面的查詢相匹配：

"elastic super nice" 

"elastic nice cool" 

"nice good great" 

"good great cool"

但只有第一場比賽是一個有效的匹配！

我怎樣才能避免每個同義詞匹配算作一個匹配，儘管它們在文檔中表示相同的標記？

任何想法如何解決這個問題？

，我讀了滲濾壺可能會解決這個問題，但我現在還不能確定是否perculators將與同義詞工作，我想它的方式......

想法？

來源

2014-01-22 user1488793

你能解決這個問題嗎？您是否嘗試使用帶有同義詞過濾器的Percolator？ – vaidik

我假設你展開同義詞。您可以使用腳本來計算匹配的位置。

Elasticsearch Google Group with a solution by Vineeth Mohan

我適應他的腳本作爲本地腳本，返回0和1之間的數字爲在該領域匹配位置的比率。我調整了一點，以配合每個查詢只有一個位置

通過使用token_count實際上計數位置

@Override 
public Object run() 
{ 
    IndexField indexField = this.indexLookup().get(field); 
    Long numberOfPositions = ((ScriptDocValues.Longs) doc().get(positionsField)).getValue(); 

    ArrayList<Integer> positions = new ArrayList<Integer>(); 
    for (String term : terms) 
    { 
     Iterator<TermPosition> termPos = indexField.get(term, IndexLookup.FLAG_POSITIONS | IndexLookup.FLAG_CACHE) 
       .iterator(); 
     while (termPos.hasNext()) 
     { 
      int position = termPos.next().position; 
      if (positions.contains(position)) 
      { 
       continue; 
      } 
      positions.add(position); 
      // if the term matches multiple positions, only a new position should count 
      break; 
     } 
    } 

    return positions.size() * 1.0/numberOfPositions; 
}

您可以用比使用人數需要的是包含位置的號碼的字段，例如它在你的查詢中作爲一個function_score腳本。

{ 
"function_score": { 
    "query": { 
     "match": { 
      "message": "I like elastic things since elasticsearch is super cool" 
     } 
    }, 
    "script_score": { 
     "params": { 
      "terms": [ 
       "I", 
       "like", 
       "elastic", 
       "things", 
       "since", 
       "elasticsearch", 
       "is", 
       "super", 
       "cool" 
      ], 
      "field": "message", 
      "positions_field": "message.pos_count" 
     }, 
     "lang": "native", 
     "script": "matched_positions_ratio" 
    }, 
    "boost_mode": "replace" 
} 
}

然後你可以設置「min_score」爲1，只得到匹配在某一領域的所有位置的文檔。

我希望這個解決方案是你需要的。

來源

2014-07-03 07:16:25 DanyG

這似乎是一個常見的用例。現在是否有更好的方法來處理這個問題？或者我們只需要使用本地腳本解決方案？ –

Elasticsearch - 檢查使用同義詞的查詢中是否包含文檔

回答

相關問題