1
我想從一個mysql表中索引1600萬個文檔(47GB)到elasticsearch索引。我正在使用jparante's elasticsearch jdbc river來做到這一點。但是,在創建河流並等待大約15分鐘之後,整個堆積內存都被消耗掉了,而沒有任何河流運行的跡象或文件被索引。當我有大約10-12百萬條記錄進行索引時,這條河運行良好。我曾嘗試過3-4次,但徒勞無功。Elasticsearch jdbc河吞噬整個內存
Heap Memory pre allocated to the ES process = 10g
elasticsearch.yml
cluster.name: test_cluster
index.cache.field.type: soft
index.cache.field.max_size: 50000
index.cache.field.expire: 2h
cloud.aws.access_key: BBNYJC25Dij8JO7YM23I(fake)
cloud.aws.secret_key: GqE6y009ZnkO/+D1KKzd6M5Mrl9/tIN2zc/acEzY(fake)
cloud.aws.region: us-west-1
discovery.type: ec2
discovery.ec2.groups: sg-s3s3c2fc(fake)
discovery.ec2.any_group: false
discovery.zen.ping.timeout: 3m
gateway.recover_after_nodes: 1
gateway.recover_after_time: 1m
bootstrap.mlockall: true
network.host: 10.111.222.33(fake)
river.sh
curl -XPUT 'http://--address--:9200/_river/myriver/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://--address--:3306/mydatabase",
"user" : "USER",
"password" : "PASSWORD",
"sql" : "select * from mytable order by creation_time desc",
"poll" : "5d",
"versioning" : false
},
"index" : {
"index" : "myindex",
"type" : "mytype",
"bulk_size" : 500,
"bulk_timeout" : "240s"
}
}'
系統屬性:
16gb RAM
200gb disk space