2014-11-15 84 views
4

Jest爲elasticsearch提供了一個出色的異步API,我們發現它非常有用。但是,有時會發現結果請求與我們所期望的稍有不同。使用Jest自定義分析器創建索引的故障

通常我們並不關心,因爲一切工作正常,但在這種情況下,它不是。

我想用自定義的ngram分析器創建一個索引。當我這樣做以下的elasticsearch其餘API文檔,我調用下面:

curl -XPUT 'localhost:9200/test' --data ' 
{ 
    "settings": { 
    "number_of_shards": 3, 
    "analysis": { 
     "filter": { 
     "keyword_search": { 
      "type":  "edge_ngram", 
      "min_gram": 3, 
      "max_gram": 15 
     } 
     }, 
     "analyzer": { 
     "keyword": { 
      "type":  "custom", 
      "tokenizer": "whitespace", 
      "filter": [ 
      "lowercase", 
      "keyword_search" 
      ] 
     } 
     } 
    } 
    } 
}' 

,然後我確認所述分析器被配置正確使用:

curl -XGET 'localhost:9200/test/_analyze?analyzer=keyword&text=Expecting many tokens 

響應我接收多個令牌等EXPex,expec等等。

現在使用Jest客戶端,我把配置json放到我的類路徑中的一個文件中,內容與上面的PUT請求的主體完全相同。我執行這樣構成的玩笑動作:

new CreateIndex.Builder(name) 
      .settings(
        ImmutableSettings.builder() 
          .loadFromClasspath(
            "settings.json" 
          ).build().getAsMap() 
      ).build(); 

在結果

  • 普里莫 - 使用tcpdump是什麼實際發佈到elasticsearch是(漂亮打印)檢查:

    { 
        "settings.analysis.filter.keyword_search.max_gram": "15", 
        "settings.analysis.filter.keyword_search.min_gram": "3", 
        "settings.analysis.analyzer.keyword.tokenizer": "whitespace", 
        "settings.analysis.filter.keyword_search.type": "edge_ngram", 
        "settings.number_of_shards": "3", 
        "settings.analysis.analyzer.keyword.filter.0": "lowercase", 
        "settings.analysis.analyzer.keyword.filter.1": "keyword_search", 
        "settings.analysis.analyzer.keyword.type": "custom" 
    } 
    
  • Secundo - 生成的索引設置爲:

    { 
        "test": { 
        "settings": { 
         "index": { 
         "settings": { 
          "analysis": { 
          "filter": { 
           "keyword_search": { 
           "type": "edge_ngram", 
           "min_gram": "3", 
           "max_gram": "15" 
           } 
          }, 
          "analyzer": { 
           "keyword": { 
           "filter": [ 
            "lowercase", 
            "keyword_search" 
           ], 
           "type": "custom", 
           "tokenizer": "whitespace" 
           } 
          } 
          }, 
          "number_of_shards": "3" <-- the only difference from the one created with rest call 
         }, 
         "number_of_shards": "3", 
         "number_of_replicas": "0", 
         "version": {"created": "1030499"}, 
         "uuid": "Glqf6FMuTWG5EH2jarVRWA" 
         } 
        } 
        } 
    } 
    
  • Tertio - 與curl -XGET 'localhost:9200/test/_analyze?analyzer=keyword&text=Expecting many tokens我得到的只是一個令牌檢查儀!

問題1.是什麼玩笑不發表我的原始設置JSON的原因,但一些處理一個呢?

問題2.爲什麼由Jest生成的設置不起作用?

回答

8

很高興你發現Jest有用,請看下面我的答案。

問題1.是什麼原因,玩笑不會發布我原來 設置JSON,但是一些處理一個呢?

這不是開玩笑,但Elasticsearch的ImmutableSettings這樣做,請參見:

Map test = ImmutableSettings.builder() 
      .loadFromSource("{\n" + 
        " \"settings\": {\n" + 
        " \"number_of_shards\": 3,\n" + 
        " \"analysis\": {\n" + 
        "  \"filter\": {\n" + 
        "  \"keyword_search\": {\n" + 
        "   \"type\":  \"edge_ngram\",\n" + 
        "   \"min_gram\": 3,\n" + 
        "   \"max_gram\": 15\n" + 
        "  }\n" + 
        "  },\n" + 
        "  \"analyzer\": {\n" + 
        "  \"keyword\": {\n" + 
        "   \"type\":  \"custom\",\n" + 
        "   \"tokenizer\": \"whitespace\",\n" + 
        "   \"filter\": [\n" + 
        "   \"lowercase\",\n" + 
        "   \"keyword_search\"\n" + 
        "   ]\n" + 
        "  }\n" + 
        "  }\n" + 
        " }\n" + 
        " }\n" + 
        "}").build().getAsMap(); 
    System.out.println("test = " + test); 

輸出:

test = { 
    settings.analysis.filter.keyword_search.type=edge_ngram, 
    settings.number_of_shards=3, 
    settings.analysis.analyzer.keyword.filter.0=lowercase, 
    settings.analysis.analyzer.keyword.filter.1=keyword_search, 
    settings.analysis.analyzer.keyword.type=custom, 
    settings.analysis.analyzer.keyword.tokenizer=whitespace, 
    settings.analysis.filter.keyword_search.max_gram=15, 
    settings.analysis.filter.keyword_search.min_gram=3 
} 

問題2:爲什麼用玩笑產生的設置不工作?

因爲您對設置JSON/map的使用不是預期的情況。我創建了這個測試重現你的情況(這是一個有點長,但包涵):

@Test 
    public void createIndexTemp() throws IOException { 
     String index = "so_q_26949195"; 

     String settingsAsString = "{\n" + 
       " \"settings\": {\n" + 
       " \"number_of_shards\": 3,\n" + 
       " \"analysis\": {\n" + 
       "  \"filter\": {\n" + 
       "  \"keyword_search\": {\n" + 
       "   \"type\":  \"edge_ngram\",\n" + 
       "   \"min_gram\": 3,\n" + 
       "   \"max_gram\": 15\n" + 
       "  }\n" + 
       "  },\n" + 
       "  \"analyzer\": {\n" + 
       "  \"keyword\": {\n" + 
       "   \"type\":  \"custom\",\n" + 
       "   \"tokenizer\": \"whitespace\",\n" + 
       "   \"filter\": [\n" + 
       "   \"lowercase\",\n" + 
       "   \"keyword_search\"\n" + 
       "   ]\n" + 
       "  }\n" + 
       "  }\n" + 
       " }\n" + 
       " }\n" + 
       "}"; 
     Map settingsAsMap = ImmutableSettings.builder() 
       .loadFromSource(settingsAsString).build().getAsMap(); 

     CreateIndex createIndex = new CreateIndex.Builder(index) 
       .settings(settingsAsString) 
       .build(); 

     JestResult result = client.execute(createIndex); 
     assertTrue(result.getErrorMessage(), result.isSucceeded()); 

     GetSettings getSettings = new GetSettings.Builder().addIndex(index).build(); 
     result = client.execute(getSettings); 
     assertTrue(result.getErrorMessage(), result.isSucceeded()); 
     System.out.println("SETTINGS SENT AS STRING settingsResponse = " + result.getJsonString()); 

     Analyze analyze = new Analyze.Builder() 
       .index(index) 
       .analyzer("keyword") 
       .source("Expecting many tokens") 
       .build(); 
     result = client.execute(analyze); 
     assertTrue(result.getErrorMessage(), result.isSucceeded()); 
     Integer actualTokens = result.getJsonObject().getAsJsonArray("tokens").size(); 
     assertTrue("Expected multiple tokens but got " + actualTokens, actualTokens > 1); 

     analyze = new Analyze.Builder() 
       .analyzer("keyword") 
       .source("Expecting single token") 
       .build(); 
     result = client.execute(analyze); 
     assertTrue(result.getErrorMessage(), result.isSucceeded()); 
     actualTokens = result.getJsonObject().getAsJsonArray("tokens").size(); 
     assertTrue("Expected single token but got " + actualTokens, actualTokens == 1); 

     admin().indices().delete(new DeleteIndexRequest(index)).actionGet(); 

     createIndex = new CreateIndex.Builder(index) 
       .settings(settingsAsMap) 
       .build(); 

     result = client.execute(createIndex); 
     assertTrue(result.getErrorMessage(), result.isSucceeded()); 

     getSettings = new GetSettings.Builder().addIndex(index).build(); 
     result = client.execute(getSettings); 
     assertTrue(result.getErrorMessage(), result.isSucceeded()); 
     System.out.println("SETTINGS AS MAP settingsResponse = " + result.getJsonString()); 

     analyze = new Analyze.Builder() 
       .index(index) 
       .analyzer("keyword") 
       .source("Expecting many tokens") 
       .build(); 
     result = client.execute(analyze); 
     assertTrue(result.getErrorMessage(), result.isSucceeded()); 
     actualTokens = result.getJsonObject().getAsJsonArray("tokens").size(); 
     assertTrue("Expected multiple tokens but got " + actualTokens, actualTokens > 1); 
    } 

當你運行它,你會發現這裏settingsAsMap實際使用的設置的情況是完全錯誤的(settings包括另一settings這是您的JSON,但他們應該已經合併)等的分析失敗。

爲什麼這不是預期的使用情況如何?

很簡單,因爲這是Elasticsearch在這種情況下的行爲方式。如果設置數據扁平(因爲它在默認情況下由ImmutableSettings類來完成),那麼它不應該有最高級別的元素settings但它可以有相同的頂級元素,如果數據不被夷爲平地(這就是爲什麼測試用的情況下工作settingsAsString)。

TL;博士:

您的設置JSON不應包括頂級 「設置」 元素(如果通過ImmutableSettings運行)。

+1

感謝您的努力,回答我的問題,就必須採取一段時間!我將你的建議應用於刪除頂部設置元素,它完美地工作。 – macias

+0

沒有probs!請記住,您可以使用原始字符串作爲「源」。 –