我試圖在ElasticSearch(AWS上的5.5,本地5.6)上使用Ingest Attachment Processor Plugin時遇到問題。我正在使用Python(3.6)進行開發,並使用elasticsearch-dls library。如何使用Elasticsearch Ingest附件處理器插件和Python包elasticsearch-dsl
我使用Persistence,並有我的課建立這樣的:
import base64
from elasticsearch_dsl.field import Attachment, Text
from elasticsearch_dsl import DocType, analyzer
lower_keyword = analyzer('keyword', tokenizer="keyword", filter=["lowercase"])
class ExampleIndex(DocType):
class Meta:
index = 'example'
doc_type = 'Example'
id = Text()
name = Text(analyzer=lower_keyword)
my_file = Attachment()
我再有這樣的,我稱之爲創建索引並保存文檔的功能。
def index_doc(a_file):
# Ensure that the Index is created before any documents are saved
try:
i = Index('example')
i.doc_type(ExampleIndex)
i.create()
# todo - Pipeline creation needs to go here - But how do you do it?
except Exception:
pass
# Check for existing index
indices = ExampleIndex()
try:
s = indices.search()
r = s.query('match', name=a_file.name).execute()
if r.success():
for h in r:
indices = ExampleIndex.get(id=h.meta.id)
break
except NotFoundError:
pass
except Exception:
logger.exception("Something went wrong")
raise
# Populate the document
indices.name = a_file.name
with open(a_file.path_to_file, 'rb') as f:
contents = f.read()
indices.my_file = base64.b64encode(contents).decode("ascii")
indices.save(pipeline="attachment") if indices.my_file else indices.save()
我與內容這是一個測試文檔的文本文件。當它的內容base64編碼它們成爲VGhpcyBpcyBhIHRlc3QgZG9jdW1lbnQK
如果我使用捲曲直接那麼它的工作原理:
創建pipline:
curl -XPUT 'localhost:9200/_ingest/pipeline/attachment?pretty' -H 'Content-Type: application/json' -d' { "description" : "Extract attachment information", "processors" : [
{
"attachment" : {
"field" : "my_file"
}
} ] }
把數據
curl -XPUT 'localhost:9200/example/Example/AV9nkyJMZAQ2lQ3CtsLb?pipeline=attachment&pretty'\
-H 'Content-Type: application/json' \
-d '{"my_file": "VGhpcyBpcyBhIHRlc3QgZG9jdW1lbnQK"}'
取數據 http://localhost:9200/example/Example/AV9nkyJMZAQ2lQ3CtsLb?pretty
{
"_index" : "example",
"_type" : "Example",
"_id" : "AV9nkyJMZAQ2lQ3CtsLb",
"_version" : 4,
"found" : true,
"_source" : {
"my_file" : "VGhpcyBpcyBhIHRlc3QgZG9jdW1lbnQK",
"attachment" : {
"content_type" : "text/plain; charset=ISO-8859-1",
"language" : "en",
"content" : "This is a test document",
"content_length" : 25
}
}
}
麻煩的是,我不能看到如何使用elasticsearch-DSL Python庫
UPDATE重新創建此: 我能得到一切比初始創建的管道現在工作等。如果我使用CURL創建管道,那麼我可以通過簡單地將.save()方法調用爲.save(pipeline =「attachment」)來使用它。我已經更新了我之前的功能,以顯示這一點,並對創建管線的位置進行評論。
這裏是捲曲的實現創造了流水線的一個例子
curl - XPUT 'localhost:9200/_ingest/pipeline/attachment?pretty' \
- H 'Content-Type: application/json' \
- d '"description": "Extract attachment information","processors": [{"attachment": {"field": "my_field"}}]}'