2014-09-28 27 views
1

我已經在我們的ES集羣上安裝了Smart Elasticsearch的Smart Analysis分析,但是我沒有找到關於如何指定正確分析器的文檔。我想除了我需要設置一個標記和過濾器指定停用詞和詞幹...如何將智能中文分析用於Elasticsearch?

例如,在荷蘭:

"dutch": { 
      "type": "custom", 
      "tokenizer": "uax_url_email", 
      "filter": ["lowercase", "asciifolding", "dutch_stemmer_filter", "dutch_stop_filter"] 
      } 

with: 

"dutch_stemmer_filter": { 
      "type": "stemmer", 
      "name": "dutch" 
      }, 

      "dutch_stop_filter": { 
      "type": "stop", 
      "stopwords": ["_dutch_"] 
      } 

如何配置我的分析對中國人嗎?

回答

5

嘗試此一定指數(分析儀 'smartcn' 和標記生成器是 'smartcn_tokenizer'):

PUT /test_chinese 
{ 
    "settings": { 
    "index": { 
     "analysis": { 
     "analyzer": { 
      "default": { 
      "type": "smartcn" 
      } 
     } 
     } 
    } 
    } 
} 

GET /test_chinese/_analyze?text='叻出色' 

它應該輸出兩個令牌(測試從plugin test classes截取):

{ 
    "tokens": [ 
     { 
     "token": "叻", 
     "start_offset": 1, 
     "end_offset": 2, 
     "type": "word", 
     "position": 2 
     }, 
     { 
     "token": "出色", 
     "start_offset": 2, 
     "end_offset": 4, 
     "type": "word", 
     "position": 3 
     } 
    ] 
} 
+0

我試過了,但是我從GET/test_chinese/_analyze?text ='叻出色'得到了21499,20986和33394三個標記,我做錯了什麼? – 2015-12-07 11:09:52

+0

您需要在所有節點上安裝插件並重新啓動它們。 – 2016-02-19 09:58:15