2017-01-02 21 views
5

在ES 5.1中爲我的數據編制索引時遇到的第一個錯誤是我的Completion Suggestion映射,其中包含一個輸出字段。Elasticsearch 5.1完成建議中的輸出字段有什麼好的選擇?

message [MapperParsingException[failed to parse]; nested: IllegalArgumentException[unknown field name [output], must be one of [input, weight, contexts]];]

所以我刪除它,但現在我的很多自動補全是不正確的,因爲它返回它相匹配,而不是單一的輸出字符串輸入。

一些谷歌搜索後,我發現從ES this製品,其提及以下:

作爲建議是文檔導向,建議元數據(例如輸出)現在應該被指定爲在文檔中的一個字段。索引建議條目時指定輸出的支持已被刪除。現在建議結果條目的文本始終是建議輸入的未分析值(與在5.0之前的索引中建立索引建議時未指定輸出相同)。

我發現原始值是帶有與建議一起返回的_source字段,但它對我來說並不是真正的解決方案,因爲密鑰和結構會根據原始對象的變化而變化。

我可以在原始對象上增加一個額外的「輸出」字段,但是這不是我一個解決方案是因爲在某些情況下,我有這樣的結構:在ES 2.4

{ 
    "id": "c2358e0c-7399-4665-ac2c-0bdd44597ac0", 
    "synonyms": ["All available colours", "Colors"], 
    "autoComplete": [{ 
     "input": ["colours available all", "available colours all", "available all colours", "colours all available", "all available colours", "all colours available"] 
    }, { 
     "input": ["colors"] 
    }] 
} 

結構是這樣的:

{ 
    "id": "c2358e0c-7399-4665-ac2c-0bdd44597ac0", 
    "synonyms": ["All available colours", "Colors"], 
    "SmartSynonym": [{ 
     "input": ["colours available all", "available colours all", "available all colours", "colours all available", "all available colours", "all colours available"], 
     "output": ["All available colours"] 
    }, { 
     "input": ["colors"], 
     "output": ["Colors"] 
    }] 
    } 

當'output'字段出現在每個Autocomplete對象中時,這並不是什麼問題。

如何在簡單的方式詢問「顏色全部可用」時如何返回ES 5.1中的原始值(例如所有可用顏色),而無需進行太多手動查找。從其他用戶

相關問題:Output field in autocomplete suggestion

回答

0

更新回答


我們結束了從原來的答案刪除自定義插件,因爲這是很難get it working in Elastic Cloud。相反,我們只是爲自動填充創建了一個單獨的文檔,並將其從所有其他文檔中刪除。

對象

public class Suggest{ 
    /* 
    * Contains the actual value it needs to return 
    * iphone 8 plus, plus iphone 8, 8 plus iphone, ... 
    * will all result into iphone 8 plus for example 
    */ 
    private String autocompleteOutput; 
    /* 
    * Contains the field and all the values of that field to autocomplete 
    */ 
    private Map<String, AutoComplete> autoComplete; 

    @JsonCreator 
    Suggest() { 
    } 

    public Suggest(String autocompleteOutput, Map<String, AutoComplete> autoComplete) { 
     this.autocompleteOutput = autocompleteOutput; 
     this.autoComplete = autoComplete; 
    } 

    public String getAutocompleteOutput() { 
     return autocompleteOutput; 
    } 

    public void setAutocompleteOutput(String autocompleteOutput) { 
     this.autocompleteOutput = autocompleteOutput; 
    } 

    public Map<String, AutoComplete> getAutoComplete() { 
     return autoComplete; 
    } 

    public void setAutoComplete(Map<String, AutoComplete> autoComplete) { 
     this.autoComplete = autoComplete; 
    } 
} 

public class AutoComplete { 
    /* 
    * Contains the permutation values from the lucene filter (see original answer 
    */ 
    private String[] input; 

    @JsonCreator 
    AutoComplete() { 
    } 

    public AutoComplete(String[] input) { 
     this.input = input; 
    } 

    public String[] getInput() { 
     return input; 
    } 
} 

與以下映射

{ 
    "suggest": { 
    "dynamic_templates": [ 
     { 
     "autocomplete": { 
      "path_match": "autoComplete.*", 
      "match_mapping_type": "*", 
      "mapping": { 
      "type": "completion", 
      "analyzer": "lowercase_keyword_analyzer" 
      } 
     } 
     } 
    ], 
    "properties": {} 
    } 
} 

這使我們能夠使用autocompleteOutput字段從_source

原來的答案


經過一番研究,我最終創建了一個新的Elasticsearch 5.1。1插件

創建一個Lucene過濾

import org.apache.lucene.analysis.TokenFilter; 
import org.apache.lucene.analysis.TokenStream; 
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; 
import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; 
import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; 
import org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute; 

import java.io.IOException; 
import java.util.*; 

/** 
* Created by glenn on 13.01.17. 
*/ 
public class PermutationTokenFilter extends TokenFilter { 
    private final CharTermAttribute charTermAtt; 
    private final PositionIncrementAttribute posIncrAtt; 
    private final OffsetAttribute offsetAtt; 
    private Iterator<String> permutations; 
    private int origOffset; 

    /** 
    * Construct a token stream filtering the given input. 
    * 
    * @param input 
    */ 
    protected PermutationTokenFilter(TokenStream input) { 
     super(input); 
     this.charTermAtt = addAttribute(CharTermAttribute.class); 
     this.posIncrAtt = addAttribute(PositionIncrementAttribute.class); 
     this.offsetAtt = addAttribute(OffsetAttribute.class); 
    } 

    @Override 
    public final boolean incrementToken() throws IOException { 
     while (true) { 
      //see if permutations have been created already 
      if (permutations == null) { 
       //see if more tokens are available 
       if (!input.incrementToken()) { 
        return false; 
       } else { 
        //Get value 
        String value = String.valueOf(charTermAtt); 
        //permute over buffer value and create iterator 
        permutations = permutation(value).iterator(); 
        origOffset = posIncrAtt.getPositionIncrement(); 
       } 
      } 
      //see if there are remaining permutations 
      if (permutations.hasNext()) { 
       //Reset the attribute to starting point 
       clearAttributes(); 
       //use the next permutation 
       String permutation = permutations.next(); 
       //add te permutation to the attributes and remove old attributes 
       charTermAtt.setEmpty().append(permutation); 
       posIncrAtt.setPositionIncrement(origOffset); 
       offsetAtt.setOffset(0,permutation.length()); 
       //remove permutation from iterator 
       permutations.remove(); 
       origOffset = 0; 
       return true; 
      } 
      permutations = null; 
     } 
    } 

    /** 
    * Changes the order of a multi value keyword so the completion suggester still knows the original value without 
    * tokenizing it if the users asks the words in a different order. 
    * 
    * @param value unpermuted value ex: Yellow Crazy Banana 
    * @return Permuted values ex: 
    * Yellow Crazy Banana, 
    * Yellow Banana Crazy, 
    * Crazy Yellow Banana, 
    * Crazy Banana Yellow, 
    * Banana Crazy Yellow, 
    * Banana Yellow Crazy 
    */ 
    private Set<String> permutation(String value) { 
     value = value.trim().replaceAll(" +", " "); 
     // Use sets to eliminate semantic duplicates (a a b is still a a b even if you switch the two 'a's in case one word occurs multiple times in a single value) 
     // Switch to HashSet for better performance 
     Set<String> set = new HashSet<String>(); 
     String[] words = value.split(" "); 
     // Termination condition: only 1 permutation for a array of 1 word 
     if (words.length == 1) { 
      set.add(value); 
     } else if (words.length <= 6) { 
      // Give each word a chance to be the first in the permuted array 
      for (int i = 0; i < words.length; i++) { 
       // Remove the word at index i from the array 
       String pre = ""; 
       for (int j = 0; j < i; j++) { 
        pre += words[j] + " "; 
       } 

       String post = " "; 
       for (int j = i + 1; j < words.length; j++) { 
        post += words[j] + " "; 
       } 
       String remaining = (pre + post).trim(); 

       // Recurse to find all the permutations of the remaining words 
       for (String permutation : permutation(remaining)) { 
        // Concatenate the first word with the permutations of the remaining words 
        set.add(words[i] + " " + permutation); 
       } 
      } 
     } else { 
      Collections.addAll(set, words); 
      set.add(value); 
     } 
     return set; 
    } 
} 

這個過濾器將原始輸入令牌「所有可用的顏色」,並將其置換到所有可能的組合(見原題)

創建工廠

import org.apache.lucene.analysis.TokenStream; 
import org.elasticsearch.index.analysis.AbstractTokenFilterFactory; 
import org.elasticsearch.common.settings.Settings; 
import org.elasticsearch.env.Environment; 
import org.elasticsearch.index.IndexSettings; 


/** 
* Created by glenn on 16.01.17. 
*/ 
public class PermutationTokenFilterFactory extends AbstractTokenFilterFactory { 

    public PermutationTokenFilterFactory(IndexSettings indexSettings, Environment environment, String name, Settings settings) { 
     super(indexSettings, name, settings); 
    } 

    public PermutationTokenFilter create(TokenStream input) { 
     return new PermutationTokenFilter(input); 
    } 
} 

這個類是爲Elasticsearch插件提供過濾器所必需的。

創建Elasticsearch插件

關注this guide設置爲Elasticsearch插件所需的配置。

<?xml version="1.0" encoding="UTF-8"?> 
<project xmlns="http://maven.apache.org/POM/4.0.0" 
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
     xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> 
    <modelVersion>4.0.0</modelVersion> 

    <groupId>be.smartspoken</groupId> 
    <artifactId>permutation-plugin</artifactId> 
    <version>5.1.1-SNAPSHOT</version> 
    <packaging>jar</packaging> 
    <name>Plugin: Permutation</name> 
    <description>Permutation plugin for elasticsearch</description> 
    <properties> 
     <lucene.version>6.3.0</lucene.version> 
     <elasticsearch.version>5.1.1</elasticsearch.version> 
     <java.version>1.8</java.version> 
     <log4j2.version>2.7</log4j2.version> 
    </properties> 

    <dependencies> 
     <dependency> 
      <groupId>org.apache.logging.log4j</groupId> 
      <artifactId>log4j-api</artifactId> 
      <version>${log4j2.version}</version> 
     </dependency> 
     <dependency> 
      <groupId>org.apache.logging.log4j</groupId> 
      <artifactId>log4j-core</artifactId> 
      <version>${log4j2.version}</version> 
     </dependency> 
     <dependency> 
      <groupId>org.apache.lucene</groupId> 
      <artifactId>lucene-test-framework</artifactId> 
      <version>${lucene.version}</version> 
      <scope>test</scope> 
     </dependency> 
     <dependency> 
      <groupId>org.apache.lucene</groupId> 
      <artifactId>lucene-core</artifactId> 
      <version>${lucene.version}</version> 
      <scope>provided</scope> 
     </dependency> 
     <dependency> 
      <groupId>org.apache.lucene</groupId> 
      <artifactId>lucene-analyzers-common</artifactId> 
      <version>${lucene.version}</version> 
      <scope>provided</scope> 
     </dependency> 
     <dependency> 
      <groupId>org.elasticsearch</groupId> 
      <artifactId>elasticsearch</artifactId> 
      <version>${elasticsearch.version}</version> 
      <scope>provided</scope> 
     </dependency> 
    </dependencies> 

    <build> 
     <resources> 
      <resource> 
       <directory>src/main/resources</directory> 
       <filtering>false</filtering> 
       <excludes> 
        <exclude>*.properties</exclude> 
       </excludes> 
      </resource> 
     </resources> 
     <plugins> 
      <plugin> 
       <groupId>org.apache.maven.plugins</groupId> 
       <artifactId>maven-assembly-plugin</artifactId> 
       <version>2.6</version> 
       <configuration> 
        <appendAssemblyId>false</appendAssemblyId> 
        <outputDirectory>${project.build.directory}/releases/</outputDirectory> 
        <descriptors> 
         <descriptor>${basedir}/src/main/assemblies/plugin.xml</descriptor> 
        </descriptors> 
       </configuration> 
       <executions> 
        <execution> 
         <phase>package</phase> 
         <goals> 
          <goal>single</goal> 
         </goals> 
        </execution> 
       </executions> 
      </plugin> 
      <plugin> 
       <groupId>org.apache.maven.plugins</groupId> 
       <artifactId>maven-compiler-plugin</artifactId> 
       <version>3.3</version> 
       <configuration> 
        <source>${java.version}</source> 
        <target>${java.version}</target> 
       </configuration> 
      </plugin> 
     </plugins> 
    </build> 

</project> 

確保你使用正確的Elasticsearch,Lucene和Log4J的(2)version.in你pom.xml文件,並提供正確的配置文件

import be.smartspoken.plugin.permutation.filter.PermutationTokenFilterFactory; 
import org.elasticsearch.index.analysis.TokenFilterFactory; 
import org.elasticsearch.indices.analysis.AnalysisModule; 
import org.elasticsearch.plugins.AnalysisPlugin; 
import org.elasticsearch.plugins.Plugin; 

import java.util.HashMap; 
import java.util.Map; 

/** 
* Created by glenn on 13.01.17. 
*/ 
public class PermutationPlugin extends Plugin implements AnalysisPlugin{ 

    @Override 
    public Map<String, AnalysisModule.AnalysisProvider<TokenFilterFactory>> getTokenFilters() { 
     Map<String, AnalysisModule.AnalysisProvider<TokenFilterFactory>> extra = new HashMap<>(); 
     extra.put("permutation", PermutationTokenFilterFactory::new); 
     return extra; 
    } 
} 

提供工廠到插件。

安裝新插件後,您需要重新啓動Elasticsearch。

使用插件

添加新的自定義分析說,「嘲笑」 2.x的

  Settings.builder() 
       .put("number_of_shards", 2) 
       .loadFromSource(jsonBuilder() 
         .startObject() 
          .startObject("analysis") 
           .startObject("analyzer") 
            .startObject("permutation_analyzer") 
             .field("tokenizer", "keyword") 
             .field("filter", new String[]{"permutation","lowercase"}) 
            .endObject() 
           .endObject() 
          .endObject() 
         .endObject().string()) 
       .loadFromSource(jsonBuilder() 
         .startObject() 
          .startObject("analysis") 
           .startObject("analyzer") 
            .startObject("lowercase_keyword_analyzer") 
             .field("tokenizer", "keyword") 
             .field("filter", new String[]{"lowercase"}) 
            .endObject() 
           .endObject() 
          .endObject() 
         .endObject().string()) 
       .build(); 

的功能,現在只有你所要做的就是提供定製的分析儀的對象映射

{ 
    "my_object": { 
     "dynamic_templates": [{ 
      "autocomplete": { 
       "path_match": "my.autocomplete.object.path", 
       "match_mapping_type": "*", 
       "mapping": { 
        "type": "completion", 
        "analyzer": "permutation_analyzer", /* custom analyzer */ 
        "search_analyzer": "lowercase_keyword_analyzer" /* custom analyzer */ 
       } 
      } 
     }], 
     "properties": { 
      /*your other properties*/ 
     } 
    } 
} 

這也可以提高性能,因爲您不必等待構建排列了。

相關問題