1

我crereated上elasticsearch指數相同的波紋管:elasticsearch NGRAM和PostgreSQL卦搜索結果不匹配

"settings" : { 
    "number_of_shards": 1, 
    "number_of_replicas": 0, 
    "analysis": { 
       "filter": { 
        "trigrams_filter": { 
         "type":  "ngram", 
         "min_gram": 3, 
         "max_gram": 3 
        } 
       }, 
       "analyzer": { 
        "trigrams": { 
         "type":  "custom", 
         "tokenizer": "standard", 
         "filter": [ 
          "lowercase", 
          "trigrams_filter" 
         ] 
        } 
       } 
    } 
}, 
"mappings": { 
    "issue": { 
     "properties": { 
      "description": { 
       "type":  "string", 
       "analyzer": "trigrams" 
      } 
     } 
    } 
} 

我的測試項目有波紋管:

"alici onay verdi basarili satisiniz gerceklesti diyor ama hesabima para transferi gerceklesmemis" 

"otomatik onay işlemi gecikmiş" 

"************* nolu iade islemi urun kargoya verilmedi zamaninda iade islemlerinde urun erorr hata veriyor" 

我已經用下列查詢測試該指數:

GET issue/_search 
{ 
    "query": { 
     "match": { 
      "description":{ 
       "query": "otomatik onay istemi zamaninda gerceklesmemis" 
      } 
     } 
    } 
} 

and resu LT:與波紋管SQL響應上PostgreSQL的

{ 
     .... 
     "hits": { 
      .... 
       "max_score": 2.3507352, 
       "hits": [ 
          { 
           ....         
           "_score": 2.3507352, 
           "_source": { 
            "issue_id": "*******", 
            "description": "alici onay verdi basarili satisiniz gerceklesti diyor ama hesabima para transferi gerceklesmemis" 
              } 
          } 
         ] 
       } 
} 

但相同的數據的另一個結果:

SELECT 
    public.tbl_issue_descriptions_big.description, 
    similarity(description, 'otomatik onay islemi zamaninda gerceklesmemis') AS sml 
FROM 
    public.tbl_issue_descriptions_big 
WHERE 
    description %'otomatik onay islemi zamaninda gerceklesmemis' 
ORDER BY 
    sml DESC 
LIMIT 10 

結果是:

description           | sml 
======================================================|====== 
otomatik onay islemi gecikmis       |0,351852 

爲什麼這種差異造成的?

回答

0

我不知道足夠的Postgres給有一份合格的答卷(因爲這還取決於被索引的文件,如果他們得分公式是完全一樣的,我懷疑),但Elasticsearch有explain APIexplain parameter在搜索中,這可以幫助您找出爲什麼某個文檔以這種方式得分。

+0

謝謝你的回答。 但我想解釋postgresql equvalent是ts_vector並用於全文搜索。但是用於機器學習的ngram和相似性。我正在搜索elasticsearch的相似算法。 –

+1

查看lucene文檔,例如https://lucene.apache.org/core/6_6_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html或https://lucene.apache.org/core/ 6_6_0/core/org/apache/lucene/search/similarities/BM25Similarity.html(如果您創建新索引,這是ES 5.0以後的默認設置) – alr

相關問題