使用ElasticSearch Tokenize QueryStringQuery字符串？

我使用Elastic 1.7（和java api）。有沒有辦法使用彈性的querystringquery式串記號化的好辦法：使用ElasticSearch Tokenize QueryStringQuery字符串？

automobile or car and (telsa or "name is missing" or "aston martin")

到令牌：

"automobile", "car", "tesla", "name is missing", "aston martin"

好像我可以用一個模式標記生成器，但這種模式在得到棘手匆忙。是更好的方法嗎？

來源

2015-12-16 eze

你的用例是什麼？這看起來很難，很高興知道你到底想要達到什麼目的，以便我們能夠提供一些替代解決方案。 – ChintanShah25

我的用例與我的描述非常相似。我需要布爾短語的令牌部分（我將它們傳遞給不同的Web服務），並且由於我已經在使用Elastic，我假設它有一個用於查詢字符串查詢的標記器，所以它認爲這可能與編寫我自己的解析器相比。 – eze

我用一堆pattern replace character filters爲此，

這是我的設置

"analysis": { 
    "char_filter": { 
     "space_pattern": { 
      "type": "pattern_replace", 
      "pattern": "\\s+", 
      "replacement": " " 
     }, 
     "replace_space_comma": { 
      "type": "pattern_replace", 
      "pattern": " ", 
      "replacement": "-" 
     }, 
     "replace_and_or_with_hyphen": { 
      "type": "pattern_replace", 
      "pattern": "(?i)-or-|-and-", 
      "replacement": " " 
     }, 
     "remove_brackets": { 
      "type": "pattern_replace", 
      "pattern": "[()]", 
      "replacement": "" 
     } 
    }, 
    "analyzer": { 
     "token_analyzer": { 
      "char_filter": ["html_strip", 
       "remove_brackets", 
       "space_pattern", 
       "replace_space_comma", 
       "replace_and_or_with_hyphen" 
      ], 
      "tokenizer": "whitespace", 
      "filter": ["lowercase"] 
     } 
    } 

}

1）html_strip是可選的（只是如果你想）

2）然後我刪除括號(和)與remove_brackets

3）然後我減少連續mult IPLE空間成一個與space_pattern

4）之後，我有逗號，replace_space_comma更換每一個空間，這一點非常重要，因爲這樣我可以用逗號刪除and和or，如果你喜歡

你可以使用任何其他符號

5）最後一步是去除and和or，(?i)不區分大小寫標誌

我使用whitespace tokenizer到文本標記劃分，我也使用小寫的過濾器（你可以，如果你想刪除此）

所以串automobile or car and (telsa or name is missing or aston martin)是越來越符號化到

{ 
    "tokens": [ 
     { 
     "token": "automobile", 
     "start_offset": 0, 
     "end_offset": 10, 
     "type": "word", 
     "position": 1 
     }, 
     { 
     "token": "car", 
     "start_offset": 14, 
     "end_offset": 17, 
     "type": "word", 
     "position": 2 
     }, 
     { 
     "token": "telsa", 
     "start_offset": 23, 
     "end_offset": 28, 
     "type": "word", 
     "position": 3 
     }, 
     { 
     "token": "name-is-missing", 
     "start_offset": 32, 
     "end_offset": 47, 
     "type": "word", 
     "position": 4 
     }, 
     { 
     "token": "aston-martin", 
     "start_offset": 51, 
     "end_offset": 64, 
     "type": "word", 
     "position": 5 
     } 
    ] 
}

這是不完美的，你就必須獲得令牌，從而獲得所需的輸出

我希望這有助於後替換空間連字符！

來源

2015-12-17 22:49:06 ChintanShah25

使用ElasticSearch Tokenize QueryStringQuery字符串？

回答

相關問題