2017-09-12 51 views
0

所有,如何在ElasticSearch 5.5中使用ingest插件時獲取termvectors?

我有以下代碼索引使用elasticsearch

public class Document 
{ 
    public string Id { get; set; } 
    public string Content { get; set; } 
    public Attachment Attachment { get; set; } 
} 

var indexResponse = client.CreateIndex("documents", c => c 
    .Settings(s => s 
    .Analysis(a => a 
    .TokenFilters(f=>f.Stemmer("english_stem",st=>st.Language("english")).Stop("english_stop",sp=>sp.StopWords("_english_"))) 
    .CharFilters(cf => cf.PatternReplace("num_filter", nf => nf.Pattern("(\\d+)").Replacement(" ")))     
    .Analyzers(an => an.Custom("tm_analyzer", ta => ta.CharFilters("num_filter").Tokenizer("standard").Filters("english_stem","english_stop","lowercase"))))) 
    .Mappings(m => m 
      .Map<Document>(mm => mm 
       .AllField(al=>al.Enabled(false)) 
       .Properties(p => p     
       .Object<Attachment>(o=>o 
       .Name(n=>n.Attachment) 
       .Properties(ps=>ps 
       .Text(s => s 
        .Name(nm => nm.Content) 
        .TermVector(TermVectorOption.Yes) 
        .Store(true) 
        .Analyzer("tm_analyzer"))))))); 

client.PutPipeline("attachments", p => p 
    .Description("Document attachment pipeline") 
    .Processors(pr => pr 
    .Attachment<Document>(a => a 
     .Field(f => f.Content) 
     .TargetField(f => f.Attachment) 
    ) 
    .Remove<Document>(r => r 
     .Field(f => f.Content) 
    ) 
) 
); 

var base64File = Convert.ToBase64String(File.ReadAllBytes("file1.xml")); 
client.Index(new Document 
{ 
    Id = "file1.xml", 
    Content = base64File 
}, i => i.Pipeline("attachments")); 

採集插件正如你可以看到我已經設置了termvector otpion爲是在內容領域的文件。 但是,當我查詢像下面使用郵遞員或C#鳥巢我得到什麼

POST /documents/document/_mtermvectors 
{ 
    "ids" : ["1.xml"], 
    "parameters": { 
     "fields": [ 
       "content" 
     ], 
     "term_statistics": true 
    } 
} 

任何想法我做錯了嗎?謝謝您的幫助!

回答

2

你在攝取處理器去掉content場這裏

.Remove<Document>(r => r 
    .Field(f => f.Content) 
) 

這可能是你想要的,因爲它會包含編碼的附件以base64。我認爲您的API調用應該查看attachment.content字段,其​​中將包含附件中提取的內容

POST /documents/document/_mtermvectors 
{ 
    "ids" : ["1.xml"], 
    "parameters": { 
     "fields": [ 
      "attachment.content" 
     ], 
     "term_statistics": true 
    } 
} 
相關問題