使用彈性搜索從文本中提取關鍵字（多字）

我有一個索引充滿關鍵字，並基於這些關鍵字我想從輸入文本中提取關鍵字。使用彈性搜索從文本中提取關鍵字（多字）

以下是示例關鍵字索引。請注意，關鍵字也可以是多個單詞，或者基本上它們是唯一的標籤。現在

{ 
    "hits": { 
    "total": 2000, 
    "hits": [ 
     { 
     "id": 1, 
     "keyword": "thousand eyes" 
     }, 
     { 
     "id": 2, 
     "keyword": "facebook" 
     }, 
     { 
     "id": 3, 
     "keyword": "superdoc" 
     }, 
     { 
     "id": 4, 
     "keyword": "quora" 
     }, 
     { 
     "id": 5, 
     "keyword": "your story" 
     }, 
     { 
     "id": 6, 
     "keyword": "Surgery" 
     }, 
     { 
     "id": 7, 
     "keyword": "lending club" 
     }, 
     { 
     "id": 8, 
     "keyword": "ad roll" 
     }, 
     { 
     "id": 9, 
     "keyword": "the honest company" 
     }, 
     { 
     "id": 10, 
     "keyword": "Draft kings" 
     } 
    ] 
    } 
}

，如果我輸入作爲「我看到貸款俱樂部的消息在Facebook上，你的故事，Quora的」文本搜索的輸出應該[「貸款俱樂部」，「臉譜」，「你的故事」，「quora」]。此外，搜索應該是案例無動於衷

來源

2015-11-07 JDpawar

只有一個真正的方法來做到這一點。你必須索引你的數據關鍵字和搜索它與帶狀皰疹分析：

看到這個再現：

首先，我們將創建兩個自定義分析：關鍵字和帶狀皰疹：

PUT test 
{ 
    "settings": { 
    "analysis": { 
     "analyzer": { 
     "my_analyzer_keyword": { 
      "type": "custom", 
      "tokenizer": "keyword", 
      "filter": [ 
      "asciifolding", 
      "lowercase" 
      ] 
     }, 
     "my_analyzer_shingle": { 
      "type": "custom", 
      "tokenizer": "standard", 
      "filter": [ 
      "asciifolding", 
      "lowercase", 
      "shingle" 
      ] 
     } 
     } 
    } 
    }, 
    "mappings": { 
    "your_type": { 
     "properties": { 
     "keyword": { 
      "type": "string", 
      "index_analyzer": "my_analyzer_keyword", 
      "search_analyzer": "my_analyzer_shingle" 
     } 
     } 
    } 
    } 
}

現在，讓我們創建一個使用你給我們一些樣本數據：

POST /test/your_type/1 
{ 
    "id": 1, 
    "keyword": "thousand eyes" 
} 
POST /test/your_type/2 
{ 
    "id": 2, 
    "keyword": "facebook" 
} 
POST /test/your_type/3 
{ 
    "id": 3, 
    "keyword": "superdoc" 
} 
POST /test/your_type/4 
{ 
    "id": 4, 
    "keyword": "quora" 
} 
POST /test/your_type/5 
{ 
    "id": 5, 
    "keyword": "your story" 
} 
POST /test/your_type/6 
{ 
    "id": 6, 
    "keyword": "Surgery" 
} 
POST /test/your_type/7 
{ 
    "id": 7, 
    "keyword": "lending club" 
} 
POST /test/your_type/8 
{ 
    "id": 8, 
    "keyword": "ad roll" 
} 
POST /test/your_type/9 
{ 
    "id": 9, 
    "keyword": "the honest company" 
} 
POST /test/your_type/10 
{ 
    "id": 10, 
    "keyword": "Draft kings" 
}

最後查詢運行搜索：

POST /test/your_type/_search 
{ 
    "query": { 
    "match": { 
     "keyword": "I saw the news of lending club on facebook, your story and quora" 
    } 
    } 
}

這是結果：

{ 
    "took": 6, 
    "timed_out": false, 
    "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 4, 
    "max_score": 0.009332742, 
    "hits": [ 
     { 
     "_index": "test", 
     "_type": "your_type", 
     "_id": "2", 
     "_score": 0.009332742, 
     "_source": { 
      "id": 2, 
      "keyword": "facebook" 
     } 
     }, 
     { 
     "_index": "test", 
     "_type": "your_type", 
     "_id": "7", 
     "_score": 0.009332742, 
     "_source": { 
      "id": 7, 
      "keyword": "lending club" 
     } 
     }, 
     { 
     "_index": "test", 
     "_type": "your_type", 
     "_id": "4", 
     "_score": 0.009207102, 
     "_source": { 
      "id": 4, 
      "keyword": "quora" 
     } 
     }, 
     { 
     "_index": "test", 
     "_type": "your_type", 
     "_id": "5", 
     "_score": 0.0014755741, 
     "_source": { 
      "id": 5, 
      "keyword": "your story" 
     } 
     } 
    ] 
    } 
}

那麼它在幕後？

它將您的文檔索引爲整個關鍵字（它將整個字符串作爲單個標記發出）。我還添加了asciifolding過濾器，因此它將字母標準化，即é變爲e）和小寫字母過濾器（不區分大小寫的搜索）。因此，例如Draft kings被索引爲draft kings
現在搜索分析器使用相同的邏輯，除了它的標記器正在發出單詞標記並且在其上創建了帶狀皰疹（標記的組合），這將與您的關鍵字匹配步。

來源

2015-11-07 11:36:10

是任何人能夠在ElasticSearch的5.x版本運行它，似乎映射類型應該從字符串改爲文字，index_analyzer只是分析，但我試圖執行一個搜索 – mac

@mac讓當too_many_clauses錯誤我試圖讓你爲你工作！ –

@mac我能夠運行查詢，但他們沒有帶回任何數據。我已經在GitHub上記錄了這個問題：https://github.com/elastic/elasticsearch/issues/26989 –

使用彈性搜索從文本中提取關鍵字（多字）

回答

相關問題