2012-08-17 85 views
2

我想索引使用Tire gem作爲ElasticSearch客戶端的pdf附件。在我的地圖,我排除的附件名稱字段從_source,使附件不存儲在索引和沒有返回的搜索結果未映射的字段包含在ElasticSearch返回的搜索結果中

mapping :_source => { :excludes => ['attachment_original'] } do 
    indexes :id, :type => 'integer' 
    indexes :folder_id, :type => 'integer' 
    indexes :attachment_file_name 
    indexes :attachment_updated_at, :type => 'date' 
    indexes :attachment_original, :type => 'attachment' 
end 

我仍然可以看到包括在搜索附件內容結果,當我運行下面的curl命令:

curl -X POST "http://localhost:9200/user_files/user_file/_search?pretty=true" -d '{ 
    "query": { 
    "query_string": { 
     "query": "rspec" 
    } 
    } 
}' 

我已經發布我的問題在這個thread

但是我剛纔注意到,不僅是附件包括在搜索結果中,但所有其他領域,包括那些沒有被映射,也包括在內,你可以在這裏看到:

{ 
    "took": 20, 
    "timed_out": false, 
    "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 1, 
    "max_score": 0.025427073, 
    "hits": [ 
     { 
     "_index": "user_files", 
     "_type": "user_file", 
     "_id": "5", 
     "_score": 0.025427073, 
     "_source": { 
      "user_file": { 
      "id": 5, 
      "folder_id": 1, 
      "updated_at": "2012-08-16T11:32:41Z", 
      "attachment_file_size": 179895, 
      "attachment_updated_at": "2012-08-16T11:32:41Z", 
      "attachment_file_name": "hw4.pdf", 
      "attachment_content_type": "application/pdf", 
      "created_at": "2012-08-16T11:32:41Z", 
      "attachment_original": "JVBERi0xLjQKJeLjz9MKNyA" 
      } 
     } 
     } 
    ] 
    } 
} 

attachment_file_sizeattachment_content_type在映射沒有定義,但在返回搜索結果:

{ 
    "id": 5, 
    "folder_id": 1, 
    "updated_at": "2012-08-16T11:32:41Z", 
    "attachment_file_size": 179895, <--------------------- 
    "attachment_updated_at": "2012-08-16T11:32:41Z", 
    "attachment_file_name": "hw4.pdf", <------------------ 
    "attachment_content_type": "application/pdf", 
    "created_at": "2012-08-16T11:32:41Z", 
    "attachment_original": "JVBERi0xLjQKJeLjz9MKNyA" 
} 

這裏是我的全面實施:

include Tire::Model::Search 
    include Tire::Model::Callbacks 

    def self.search(folder, params) 
    tire.search() do 
     query { string params[:query], default_operator: "AND"} if params[:query].present? 
     #filter :term, folder_id: folder.id 
     #highlight :attachment_original, :options => {:tag => "<em>"} 
     raise to_curl 
    end 
    end 

    mapping :_source => { :excludes => ['attachment_original'] } do 
    indexes :id, :type => 'integer' 
    indexes :folder_id, :type => 'integer' 
    indexes :attachment_file_name 
    indexes :attachment_updated_at, :type => 'date' 
    indexes :attachment_original, :type => 'attachment' 
    end 

    def to_indexed_json 
    to_json(:methods => [:attachment_original]) 
    end 

    def attachment_original 
    if attachment_file_name.present? 
     path_to_original = attachment.path 
     Base64.encode64(open(path_to_original) { |f| f.read }) 
    end  
    end 

有人能幫助我弄清楚爲什麼所有的字段重新包含在_source

編輯:這是運行localhost:9200/user_files/_mapping

{ 
    "user_files": { 
    "user_file": { 
     "_source": { 
     "excludes": [ 
      "attachment_original" 
     ] 
     }, 
     "properties": { 
     "attachment_content_type": { 
      "type": "string" 
     }, 
     "attachment_file_name": { 
      "type": "string" 
     }, 
     "attachment_file_size": { 
      "type": "long" 
     }, 
     "attachment_original": { 
      "type": "attachment", 
      "path": "full", 
      "fields": { 
      "attachment_original": { 
       "type": "string" 
      }, 
      "author": { 
       "type": "string" 
      }, 
      "title": { 
       "type": "string" 
      }, 
      "name": { 
       "type": "string" 
      }, 
      "date": { 
       "type": "date", 
       "format": "dateOptionalTime" 
      }, 
      "keywords": { 
       "type": "string" 
      }, 
      "content_type": { 
       "type": "string" 
      } 
      } 
     }, 
     "attachment_updated_at": { 
      "type": "date", 
      "format": "dateOptionalTime" 
     }, 
     "created_at": { 
      "type": "date", 
      "format": "dateOptionalTime" 
     }, 
     "folder_id": { 
      "type": "integer" 
     }, 
     "id": { 
      "type": "integer" 
     }, 
     "updated_at": { 
      "type": "date", 
      "format": "dateOptionalTime" 
     } 
     } 
    } 
    } 
} 

的輸出正如你所看到的,由於某種原因,所有的領域都包含在映射!

+0

在這個線程http://stackoverflow.com/questions/11251851/how-do-you-index-attachment-in-elasticsearch-with-tire?rq=1它看起來像未定義的字段也包括在映射。 – 2012-08-17 08:50:09

回答

1

在你的to_indexed_json中,你包含了attachment_original方法,所以它被髮送到elasticsearch。這也是所有其他屬性都包含在映射中的原因,因此也是源代碼。

有關該主題的更多信息,請參閱ElasticSearch & Tire: Using Mapping and to_indexed_json問題。

看來Tire確實會將正確的映射JSON發送到elasticsearch--我的建議是使用Tire.configure { logger STDERR, level: "debug" }來檢查發生了什麼事情,並通過trz來查明原始級別的問題。

+0

我誤解了to_indexed_json的工作原理。再次感謝鏈接,它有很大的幫助。 – 2012-08-18 03:00:40

相關問題