智能中文分析Elasticsearch返回unicode

我試圖使用智能中文分析器分析Elasticsearch中的文檔，但不是獲取分析的中文字符，而是Elasticsearch返回這些字符的Unicode。例如：智能中文分析Elasticsearch返回unicode

PUT /test_chinese 
{ 
    "settings": { 
     "index": { 
      "analysis": { 
       "analyzer": { 
        "default": { 
         "type": "smartcn" 
        } 
       } 
      } 
     } 
    } 
} 

GET /test_chinese/_analyze?text='我說世界好!'

我希望讓每一箇中國人的性格，但我得到：

{ 
    "tokens": [ 
     { 
      "token": "25105", 
      "start_offset": 3, 
      "end_offset": 8, 
      "type": "word", 
      "position": 4 
     }, 
     { 
      "token": "35828", 
      "start_offset": 11, 
      "end_offset": 16, 
      "type": "word", 
      "position": 8 
     }, 
     { 
      "token": "19990", 
      "start_offset": 19, 
      "end_offset": 24, 
      "type": "word", 
      "position": 12 
     }, 
     { 
      "token": "30028", 
      "start_offset": 27, 
      "end_offset": 32, 
      "type": "word", 
      "position": 16 
     }, 
     { 
      "token": "22909", 
      "start_offset": 35, 
      "end_offset": 40, 
      "type": "word", 
      "position": 20 
     } 
    ] 
}

你有任何想法，這是怎麼回事？

謝謝！

來源

2015-12-14 Frody

我發現有關我的問題的問題。 Sense似乎有一個錯誤。在這裏你可以找到扎卡里塘，Elasticsearch開發者交談：https://discuss.elastic.co/t/smart-chinese-analysis-returns-unicodes-instead-of-chinese-tokens/37133 這裏發現的bug票：https://github.com/elastic/sense/issues/88

來源

2015-12-15 14:12:49 Frody

智能中文分析Elasticsearch返回unicode

回答

相關問題