2017-04-12 45 views
0

我在使用Elasticsearch Python客戶端時遇到了一個問題。我有一個名爲test.json的文件(有效!)JSON。我現在想要在elasticsearch中索引該JSON。我試過這個little Tutorial來檢查我是否可以連接到我的本地elasticsearch實例,它的工作,所以我相信這個問題是不是在我與elasticsearch連接。Elasticsearch Python客戶端索引JSON

當我跑我的小代碼在這裏:

from elasticsearch import Elasticsearch 
import json 

es = Elasticsearch([{'host': 'localhost', 'port': 9200}]) 

with open('test.json') as json_data: 
    es.index(index='testdata', doc_type='generated', id=1, body=json.load(json_data)) 

我在我的命令行得到這個異常(mapper_parsing_exception?):

Traceback (most recent call last): 
    File "app.py", line 13, in <module> 
    es.index(index='testdata', doc_type='generated', id=1, body=json.load(json_data)) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped 
    return func(*args, params=params, **kwargs) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 300, in index 
    _make_path(index, doc_type, id), params=params, body=body) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 318, in perform_request 
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 128, in perform_request 
    self._raise_error(response.status, raw_data) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 124, in _raise_error 
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info) 
elasticsearch.exceptions.RequestError: TransportError(400, u'mapper_parsing_exception', u'failed to parse') 

你能指出我在賴特方向,什麼可能是問題嗎?

啊,是的,我打印了「json.load(json_data)」螞蟻工作完美,這意味着從文件加載JSON沒有問題。

感謝您的幫助! Greez

更新:

with open('test.json') as json_data: 
    #d = json.load(json_data) 
    print(json_data) 
    es.index(index='testdata', doc_type='generated', id=1, body=json_data) 

此代碼也不管用,我甚至不能打印JSON的CL。現在

錯誤:

<open file 'test.json', mode 'r' at 0x7f8329340c00> 
Traceback (most recent call last): 
    File "app.py", line 14, in <module> 
    es.index(index='testdata', doc_type='generated', id=1, body=json_data) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped 
    return func(*args, params=params, **kwargs) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 300, in index 
    _make_path(index, doc_type, id), params=params, body=body) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 284, in perform_request 
    body = self.serializer.dumps(body) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/serializer.py", line 50, in dumps 
    raise SerializationError(data, e) 
elasticsearch.exceptions.SerializationError: (<closed file 'test.json', mode 'r' at 0x7f8329340c00>, TypeError("Unable to serialize <open file 'test.json', mode 'r' at 0x7f8329340c00> (type: <type 'file'>)",)) 

多數民衆贊成在test.json文件(只是一些隨機生成的JSON)的內容:

[ 
    { 
     "_id": "58ee19e75ffc814d4dff17da", 
     "index": 0, 
     "guid": "45476739-80b3-49de-8f00-9923f84f56ce", 
     "isActive": true, 
     "balance": "$2,882.08", 
     "picture": "http://placehold.it/32x32", 
     "age": 31, 
     "eyeColor": "blue", 
     "name": "Liliana Odom", 
     "gender": "female", 
     "company": "PLASTO", 
     "email": "[email protected]", 
     "phone": "+1 (983) 474-3785", 
     "address": "121 Sedgwick Place, Farmington, Marshall Islands, 2593", 
     "about": "Adipisicing veniam ex nulla irure minim incididunt et irure est nostrud ex ut. Occaecat eu proident eu pariatur deserunt aliquip. Commodo ullamco incididunt consequat quis commodo irure elit quis. Aute et reprehenderit ad ipsum magna cupidatat magna minim sunt labore mollit occaecat. Dolore sint veniam deserunt excepteur.", 
     "registered": "2015-05-07T05:40:28 -02:00", 
     "latitude": -46.141522, 
     "longitude": -157.943368, 
     "tags": [ 
      "labore", 
      "quis" 
     ], 
     "friends": [ 
      { 
      "id": 0, 
      "name": "Earline Bass" 
      } 
     ], 
     "greeting": "Hello, Liliana Odom! You have 5 unread messages.", 
     "favoriteFruit": "apple" 
     } 
    ] 

更新2:

我想這現在:

id = 1 
with open('test.json') as json_data: 
    data = json.load(json_data) 
    for dat in data: 
     print(json.dumps(dat)) 
     es.index(index='testdata', doc_type='generated', id=id, body=json.dumps(dat)) 
     id += 1 

打印(json.dumps(DAT))的作品,但我現在得到一個IllegalArgumentException:

Traceback (most recent call last): 
    File "app.py", line 15, in <module> 
    es.index(index='testdata', doc_type='generated', id=id, body=json.dumps(dat)) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped 
    return func(*args, params=params, **kwargs) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 300, in index 
    _make_path(index, doc_type, id), params=params, body=body) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 318, in perform_request 
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 128, in perform_request 
    self._raise_error(response.status, raw_data) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 124, in _raise_error 
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info) 
elasticsearch.exceptions.RequestError: TransportError(400, u'illegal_argument_exception', u'[Bloodstorm][127.0.0.1:9300][indices:data/write/index[p]]') 

更新3: Hereis ES日誌,貌似id字段是該指數定義了兩次。

[2017-04-12 17:43:07,847][DEBUG][action.index    ] [Bloodstorm] failed to execute [index {[testdata][generated][AVti1SY7fn4azWzi8gyQ], source[{"guid": "45476739-80b3-49de-8f00-9923f84f56ce", "index": 0, "favoriteFruit": "apple", "latitude": -46.141522, "company": "PLASTO", "email": "[email protected]", "picture": "http://placehold.it/32x32", "tags": ["labore", "quis"], "registered": "2015-05-07T05:40:28 -02:00", "eyeColor": "blue", "phone": "+1 (983) 474-3785", "address": "121 Sedgwick Place, Farmington, Marshall Islands, 2593", "friends": [{"id": 0, "name": "Earline Bass"}], "isActive": true, "about": "Adipisicing veniam ex nulla irure minim incididunt et irure est nostrud ex ut. Occaecat eu proident eu pariatur deserunt aliquip. Commodo ullamco incididunt consequat quis commodo irure elit quis. Aute et reprehenderit ad ipsum magna cupidatat magna minim sunt labore mollit occaecat. Dolore sint veniam deserunt excepteur.", "balance": "$2,882.08", "name": "Liliana Odom", "gender": "female", "age": 31, "greeting": "Hello, Liliana Odom! You have 5 unread messages.", "longitude": -157.943368, "_id": "58ee19e75ffc814d4dff17da"}]}] on [[testdata][3]] 
java.lang.IllegalArgumentException: Field [_id] is defined twice in [generated] 
     at org.elasticsearch.index.mapper.MapperService.checkFieldUniqueness(MapperService.java:496) 
     at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:376) 
     at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:320) 
     at org.elasticsearch.cluster.metadata.MetaDataMappingService$PutMappingExecutor.applyRequest(MetaDataMappingService.java:306) 
     at org.elasticsearch.cluster.metadata.MetaDataMappingService$PutMappingExecutor.execute(MetaDataMappingService.java:230) 
     at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:480) 
     at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:784) 
     at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231) 
     at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
     at java.lang.Thread.run(Thread.java:745) 

回答

2

鑑於你test.json文件的結構,你需要分析它,然後每個文檔遍歷數組中:

with open('test.json') as raw_data: 
    json_docs = json.loads(raw_data) 
    for json_doc in json_docs: 
     my_id = json_doc.pop('_id', None) 
     es.index(index='testdata', doc_type='generated', id=my_id, body=json.dumps(json_doc)) 
+0

看來我要: 'with打開( 'test.json')作爲json_data: #D = json.load(json_data) 打印(json_data) es.index(指數='TESTDATA ',doc_type ='generated',id = 1,body = json_data)' 給我這個新錯誤 'elasticsearch.exceptions.SerializationError :((type :) )似乎反引號不起作用來標記內聯代碼 – PouletFreak

+0

您應該更新您的問題與該錯誤,所以它更清晰。你也可以分享你的'test.json'文件的內容嗎? – Val

+0

對不起,我在這裏比較新;-),更新了我的問題 – PouletFreak

0

可以從您的test.json文件中刪除括號,並嘗試再次。

{ 
     "_id": "58ee19e75ffc814d4dff17da", 
     "index": 0, 
     "guid": "45476739-80b3-49de-8f00-9923f84f56ce", 
     "isActive": true, 
     "balance": "$2,882.08", 
     "picture": "http://placehold.it/32x32", 
     "age": 31, 
     "eyeColor": "blue", 
     "name": "Liliana Odom", 
     "gender": "female", 
     "company": "PLASTO", 
     "email": "[email protected]", 
     "phone": "+1 (983) 474-3785", 
     "address": "121 Sedgwick Place, Farmington, Marshall Islands, 2593", 
     "about": "Adipisicing veniam ex nulla irure minim incididunt et irure est nostrud ex ut. Occaecat eu proident eu pariatur deserunt aliquip. Commodo ullamco incididunt consequat quis commodo irure elit quis. Aute et reprehenderit ad ipsum magna cupidatat magna minim sunt labore mollit occaecat. Dolore sint veniam deserunt excepteur.", 
     "registered": "2015-05-07T05:40:28 -02:00", 
     "latitude": -46.141522, 
     "longitude": -157.943368, 
     "tags": [ 
      "labore", 
      "quis" 
     ], 
     "friends": [ 
      { 
      "id": 0, 
      "name": "Earline Bass" 
      } 
     ], 
     "greeting": "Hello, Liliana Odom! You have 5 unread messages.", 
     "favoriteFruit": "apple" 
     } 
+1

他的JSON文件中可能有幾條記錄,它的有效性是 – Val

+0

是的,在我的其他json文件中,有更多的記錄。 – PouletFreak