Elasticsearch自定義分析器不工作

我正在使用elasticsearch作爲我的搜索引擎，現在我正在嘗試創建自定義分析器以使字段值僅爲小寫。下面是我的代碼：Elasticsearch自定義分析器不工作

創建索引和映射

create index with a custom analyzer named test_lowercase： 

curl -XPUT 'localhost:9200/test/' -d '{ 
    "settings": { 
    "analysis": { 
     "analyzer": { 
     "test_lowercase": { 
      "type": "pattern", 
      "pattern": "^.*$" 
     } 
     } 
    } 
    } 
}' 

create a mapping using the test_lowercase analyzer for the address field： 

curl -XPUT 'localhost:9200/test/_mapping/Users' -d '{ 
    "Users": { 
    "properties": { 
     "name": { 
     "type": "string" 
     }, 
     "address": { 
     "type": "string", 
     "analyzer": "test_lowercase" 
     } 
    } 
    } 
}'

要驗證test_lowercase分析工作：

curl -XGET 'localhost:9200/test/_analyze?analyzer=test_lowercase&pretty' -d ' 
Beijing China 
' 
{ 
    "tokens" : [ { 
    "token" : "\nbeijing china\n", 
    "start_offset" : 0, 
    "end_offset" : 15, 
    "type" : "word", 
    "position" : 0 
    } ] 
}

正如我們所看到的，字符串「北京中國'被索引爲一個小寫的整個術語'beijing china'，所以test_lowercase分析儀可以正常工作。

要驗證領域的「地址」使用小寫分析：

curl -XGET 'http://localhost:9200/test/_analyze?field=address&pretty' -d ' 
Beijing China 
' 
{ 
    "tokens" : [ { 
    "token" : "\nbeijing china\n", 
    "start_offset" : 0, 
    "end_offset" : 15, 
    "type" : "word", 
    "position" : 0 
    } ] 
} 
curl -XGET 'http://localhost:9200/test/_analyze?field=name&pretty' -d ' 
Beijing China 
' 
{ 
    "tokens" : [ { 
    "token" : "beijing", 
    "start_offset" : 1, 
    "end_offset" : 8, 
    "type" : "<ALPHANUM>", 
    "position" : 0 
    }, { 
    "token" : "china", 
    "start_offset" : 9, 
    "end_offset" : 14, 
    "type" : "<ALPHANUM>", 
    "position" : 1 
    } ] 
}

正如我們所看到的，對於相同的字符串「北京中國」，如果我們用現場=地址進行分析，它創建了一個單獨的項目'beijing china'，當使用field = name時，我們得到了兩個項目'beijing'和'china'，所以看起來現場地址是使用我的自定義分析器'test_lowercase'。

插入文檔的測試指標，看看分析儀適用於文件

curl -XPUT 'localhost:9200/test/Users/12345?pretty' -d '{"name": "Jinshui Tang", "address": "Beijing China"}'

不幸的是，該文檔已被成功插入，但地址欄還沒有被正確地分析。

curl -XGET 'http://localhost:9200/test/Users/_search?pretty' -d ' 
{ 
    "query": { 
    "wildcard": { 
     "address": "*beijing ch*" 
    } 
    } 
}' 
{ 
    "took" : 8, 
    "timed_out" : false, 
    "_shards" : { 
    "total" : 5, 
    "successful" : 5, 
    "failed" : 0 
    }, 
    "hits" : { 
    "total" : 0, 
    "max_score" : null, 
    "hits" : [ ] 
    } 
}

列出所有的文檔分析方面：如下我不能使用通配符查詢搜索出它

於是我運行下面的命令來查看文檔中的所有條款，我發現「北京中國」根本就不是「矢量」這個詞。

curl -XGET 'http://localhost:9200/test/Users/12345/_termvector?fields=*&pretty' 
{ 
    "_index" : "test", 
    "_type" : "Users", 
    "_id" : "12345", 
    "_version" : 3, 
    "found" : true, 
    "took" : 2, 
    "term_vectors" : { 
    "name" : { 
     "field_statistics" : { 
     "sum_doc_freq" : 2, 
     "doc_count" : 1, 
     "sum_ttf" : 2 
     }, 
     "terms" : { 
     "jinshui" : { 
      "term_freq" : 1, 
      "tokens" : [ { 
      "position" : 0, 
      "start_offset" : 0, 
      "end_offset" : 7 
      } ] 
     }, 
     "tang" : { 
      "term_freq" : 1, 
      "tokens" : [ { 
      "position" : 1, 
      "start_offset" : 8, 
      "end_offset" : 12 
      } ] 
     } 
     } 
    } 
    } 
}

我們可以看到，名稱正確分析，併成爲兩屆「金水」和「唐」，但地址丟失。

任何人都可以請幫忙嗎？有什麼遺漏嗎？

非常感謝！

來源

2015-10-14 jinshui

要小寫文本，您不需要pattern。使用這樣的事情：

PUT /test 
{ 
    "settings": { 
    "analysis": { 
     "analyzer": { 
     "test_lowercase": { 
      "type": "custom", 
      "filter": [ 
      "lowercase" 
      ], 
      "tokenizer": "keyword" 
     } 
     } 
    } 
    } 
} 

PUT /test/_mapping/Users 
{ 
    "Users": { 
    "properties": { 
     "name": { 
     "type": "string" 
     }, 
     "address": { 
     "type": "string", 
     "analyzer": "test_lowercase" 
     } 
    } 
    } 
} 

PUT /test/Users/12345 
{"name": "Jinshui Tang", "address": "Beijing China"}

，並覈實你做了正確的事情，這樣做：

GET /test/Users/_search 
{ 
    "fielddata_fields": ["name", "address"] 
}

你會看到究竟 Elasticsearch如何檢索數據：

 "fields": { 
      "name": [ 
       "jinshui", 
       "tang" 
      ], 
      "address": [ 
       "beijing", 
       "china" 
      ] 
     }

來源

2015-10-14 09:33:02

謝謝Andrei Stefan。我需要小寫字母，我需要將地址的全部值作爲單個項目，而不是分成單詞。例如，'北京中國'，我不想要'北京'和'中國'，相反，我想要一個單詞'北京中國'，以便用戶可以用通配符'* * * *'來搜索它。 – jinshui

我更新了我的答案，並將'analyzer'從'standard'更改爲'keyword'。試試看。 –

謝謝安德烈，是的。你的代碼適合我，這正是我想要的。我會接受你的回答。但是你知道我的模式分析器有什麼問題嗎？ – jinshui

Elasticsearch自定義分析器不工作

回答

相關問題