elasticsearch phraze term frequency .tf（）包含多個單詞

我想要從多個單詞（例如，「綠色能源」elasticsearch phraze term frequency .tf（）包含多個單詞

我可以訪問「綠色」和「節能」的TF，例如：

"function_score": 
{ 
    "filter" : { 
     "terms" : { "content" : ["energy","green"]} 
    }, 
    "script_score": { 
     "script": "_index['content']['energy'].tf() + _index['content']['green'].tf()", 
     "lang":"groovy" 
    } 
}

這工作得很好。但是，我怎麼能找到一個「綠色能源」的頻率

_index['content']['green energy'].tf()

不起作用

來源

2014-10-28 valerij vasilcenko

我認爲這取決於你如何讓您的數據索引，你有什麼要求搜索時。例如，如果你有「間接的綠色能源」（意思是「綠色」和「能量」彼此接近），你希望你的腳本「匹配」「綠色能源」，並給你一個tf（）評估，那麼你需要相應地索引你的數據。就像你說的那樣 - 「綠色能源」這個術語的頻率「可以歸結爲產生某種術語」綠色能源「。你的情況

一個想法是用另一種領域"content"但"shingles"分析：

PUT /some_index 
{ 
    "settings": { 
    "analysis": { 
     "filter": { 
     "my_shingle_filter": { 
      "type": "shingle", 
      "min_shingle_size": 2, 
      "max_shingle_size": 2, 
      "output_unigrams": false 
     } 
     }, 
     "analyzer": { 
     "my_shingle_analyzer": { 
      "type": "custom", 
      "tokenizer": "standard", 
      "filter": [ 
      "lowercase", 
      "my_shingle_filter" 
      ] 
     } 
     } 
    } 
    }, 
    "mappings": { 
    "some_type": { 
     "properties": { 
     "content": { 
      "type": "string", 
      "index": "analyzed", 
      "fields": { 
      "with_shingles": { 
       "type": "string", 
       "analyzer": "my_shingle_analyzer" 
      } 
      } 
     } 
     } 
    } 
    } 
}

而且在功能評分，您會引用.with_shingles領域：

{ 
    "query": { 
    "function_score": { 
     "filter": { 
     "terms": { 
      "content": [ 
      "energy", 
      "green" 
      ] 
     } 
     }, 
     "script_score": { 
     "script": "_index['content.with_shingles']['green energy'].tf()", 
     "lang": "groovy" 
     } 
    } 
    } 
}

這是隻是一個例子來證明你需要相應地索引你的數據，這樣你就可以得到你想要的.tf()。在我的例子中，我假設你想搜索確切的術語「綠色能源」，所以我使用了「帶狀皰疹」，以上文爲例，可以得到如下分析列表："content.with_shingles": ["energy to","green energy","indirect green","to spare"]。

來源

2014-10-28 09:39:44

完美，非常感謝。只是其他人的一個注意：不要忘了URL編碼你的JSON請求。 '綠色能源'應該成爲'綠色+能源' – 2014-10-29 13:51:55

elasticsearch phraze term frequency .tf（）包含多個單詞

回答

相關問題