0
我正在嘗試從rdd寫入elasticsearch(pyspark,python 3.5)。 我能夠正確編寫json的主體,但是彈性搜索而不是採用我的_id,它創建它自己的。無法在elasticsearch-hadoop上設置_id
我的代碼:
class Article:
def __init__(self, title, text, text2):
self.id_ = title
self.text = text
self.text2 = text2
if __name__ == '__main__':
pt=_sc.parallelize([Article("rt", "ted", "ted2"),Article("rt2", "ted2", "ted22")])
save=pt.map(lambda item:
(item.id_,
{
'text' : item.text,
'text2' : item.text2
}
))
es_write_conf = {
"es.nodes": "localhost",
"es.port": "9200",
"es.resource": 'db/table1'
}
save.saveAsNewAPIHadoopFile(
path='-',
outputFormatClass="org.elasticsearch.hadoop.mr.EsOutputFormat",
keyClass="org.apache.hadoop.io.NullWritable",
valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
conf=es_write_conf)
程序跟蹤: link to the image