2013-02-08 89 views
9

我試圖用elasticsearch/NEST索引pdf文檔。ElasticSearch&附件類型(NEST C#)

該文件已建立索引,但搜索結果返回0次匹配。

我需要的搜索結果只返回文檔ID和高亮結果

(不以base64內容)

下面是代碼:

在這裏我要感謝所有幫助,

感謝,

class Program 
{ 
    static void Main(string[] args) 
    { 
     // create es client 
     string index = "myindex"; 

     var settings = new ConnectionSettings("localhost", 9200) 
      .SetDefaultIndex(index); 
     var es = new ElasticClient(settings); 

     // delete index if any 
     es.DeleteIndex(index); 

     // index document 
     string path = "test.pdf"; 
     var doc = new Document() 
     { 
      Id = 1, 
      Title = "test", 
      Content = Convert.ToBase64String(File.ReadAllBytes(path)) 
     }; 

     var parameters = new IndexParameters() { Refresh = true }; 
     if (es.Index<Document>(doc, parameters).OK) 
     { 
      // search in document 
      string query = "semantic"; // test.pdf contains the string "semantic" 

      var result = es.Search<Document>(s => s 
       .Query(q => 
        q.QueryString(qs => qs 
         .Query(query) 
        ) 
       ) 
       .Highlight(h => h 
        .PreTags("<b>") 
        .PostTags("</b>") 
        .OnFields(
         f => f 
         .OnField(e => e.Content) 
         .PreTags("<em>") 
         .PostTags("</em>") 
        ) 
       ) 
      ); 

      if (result.Hits.Total == 0) 
      { 
      } 
     } 
    } 
} 

[ElasticType(
    Name = "document", 
    SearchAnalyzer = "standard", 
    IndexAnalyzer = "standard" 
)] 
public class Document 
{ 
    public int Id { get; set; } 

    [ElasticProperty(Store = true)] 
    public string Title { get; set; } 

    [ElasticProperty(Type = FieldType.attachment, 
     TermVector = TermVectorOption.with_positions_offsets)] 
    public string Content { get; set; } 
} 
+0

此外,搜索證實,映射器,附件插件安裝並加載(使用es.yml:plugin.mandatory:映射器-attachments)。儘管如此,我的pdf中沒有包含任何詞語。我已經搜索了這個問題的答案(stackoverflow和其他人),只有捲曲的例子,沒有使用C#/ NEST的使用示例。 (只是一個註釋:當搜索document.title('test.pdf')時,我確實收到了文檔,但是在搜索'test'時沒有命中。 – 2013-02-09 20:54:23

+0

只是爲了讓你知道我打算爲這個明天創建集成測試並回答這個問題。我無法早日回答。 – 2013-02-13 12:19:17

+1

對此問題的任何更新? – slimflem 2013-09-07 19:40:12

回答

1

//我現在用FSRiver插件 - https://github.com/dadoonet/fsriver/

void Main() 
{ 
    // search in document 
    string query = "directly"; // test.pdf contains the string "directly" 
    var es = new ElasticClient(new ConnectionSettings(new Uri("http://*.*.*.*:9200")) 
     .SetDefaultIndex("mydocs") 
     .MapDefaultTypeNames(s=>s.Add(typeof(Doc), "doc"))); 
     var result = es.Search<Doc>(s => s 
     .Fields(f => f.Title, f => f.Name) 
     .From(0) 
     .Size(10000) 
      .Query(q => q.QueryString(qs => qs.Query(query))) 
      .Highlight(h => h 
       .PreTags("<b>") 
       .PostTags("</b>") 
       .OnFields(
        f => f 
        .OnField(e => e.File) 
        .PreTags("<em>") 
        .PostTags("</em>") 
       ) 
      ) 
     ); 
} 

[ElasticType(Name = "doc", SearchAnalyzer = "standard", IndexAnalyzer = "standard")] 
public class Doc 
{ 
    public int Id { get; set; } 

    [ElasticProperty(Store = true)] 
    public string Title { get; set; } 

    [ElasticProperty(Type = FieldType.attachment, TermVector = TermVectorOption.with_positions_offsets)] 
    public string File { get; set; } 
    public string Name { get; set; } 
} 
0

我在相同的工作,所以我現在想這個 http://www.elasticsearch.cn/tutorials/2011/07/18/attachment-type-in-action.html

本文解釋問題

工資注意力放在你應該做正確的映射

"title" : { "store" : "yes" }, 
"file" : { "term_vector":"with_positions_offsets", "store":"yes" } 

我會嘗試弄清楚如何用NEST API來做到這一點,並更新這篇文章

+0

有關使其工作的任何更新? – bayCoder 2014-05-30 19:04:23

-1

在索引項目之前,您需要添加如下所示的映射。

client.CreateIndex("yourindex", c => c.NumberOfReplicas(0).NumberOfShards(12).AddMapping<AssetSearchEntryModels>(m => m.MapFromAttributes())); 
8

安裝附件插件並重新啓動ES

bin/plugin -install elasticsearch/elasticsearch-mapper-attachments/2.3.2 

創建一個附件類映射到附件插件文檔

public class Attachment 
    { 
     [ElasticProperty(Name = "_content")] 
     public string Content { get; set; } 

     [ElasticProperty(Name = "_content_type")] 
     public string ContentType { get; set; } 

     [ElasticProperty(Name = "_name")] 
     public string Name { get; set; } 
    } 

添加的屬性上,你與索引的文檔類名稱「文件」並正確映射

[ElasticProperty(Type = FieldType.Attachment, TermVector = TermVectorOption.WithPositionsOffsets, Store = true)] 
    public Attachment File { get; set; } 

在您爲班級的任何實例編制索引之前,顯式創建您的索引。如果你不這樣做,它將使用動態映射並忽略你的屬性映射。如果將來更改映射,請始終重新創建索引。

client.CreateIndex("index-name", c => c 
    .AddMapping<Document>(m => m.MapFromAttributes()) 
); 

指數的項目

string path = "test.pdf"; 

    var attachment = new Attachment(); 
    attachment.Content = Convert.ToBase64String(File.ReadAllBytes(path)); 
    attachment.ContentType = "application/pdf"; 
    attachment.Name = "test.pdf"; 

    var doc = new Document() 
    { 
     Id = 1, 
     Title = "test", 
     File = attachment 
    }; 
    client.Index<Document>(item); 

上的文件屬性

var query = Query<Document>.Term("file", "searchTerm"); 

    var searchResults = client.Search<Document>(s => s 
      .From(start) 
      .Size(count) 
      .Query(query) 
); 
+0

偉大的它適合我....謝謝你! – 2017-01-09 13:26:42