2013-02-27 40 views
1

我想從一個mysql表中索引1600萬個文檔(47GB)到elasticsearch索引。我正在使用jparante's elasticsearch jdbc river來做到這一點。但是,在創建河流並等待大約15分鐘之後,整個堆積內存都被消耗掉了,而沒有任何河流運行的跡象或文件被索引。當我有大約10-12百萬條記錄進行索引時,這條河運行良好。我曾嘗試過3-4次,但徒勞無功。Elasticsearch jdbc河吞噬整個內存

Heap Memory pre allocated to the ES process = 10g

elasticsearch.yml

cluster.name: test_cluster 

index.cache.field.type: soft 
index.cache.field.max_size: 50000 
index.cache.field.expire: 2h 

cloud.aws.access_key: BBNYJC25Dij8JO7YM23I(fake) 
cloud.aws.secret_key: GqE6y009ZnkO/+D1KKzd6M5Mrl9/tIN2zc/acEzY(fake) 
cloud.aws.region: us-west-1 

discovery.type: ec2 
discovery.ec2.groups: sg-s3s3c2fc(fake) 
discovery.ec2.any_group: false 
discovery.zen.ping.timeout: 3m 

gateway.recover_after_nodes: 1 
gateway.recover_after_time: 1m 

bootstrap.mlockall: true 

network.host: 10.111.222.33(fake) 

river.sh

curl -XPUT 'http://--address--:9200/_river/myriver/_meta' -d '{ 
    "type" : "jdbc", 
    "jdbc" : { 
     "driver" : "com.mysql.jdbc.Driver", 
     "url" : "jdbc:mysql://--address--:3306/mydatabase", 
     "user" : "USER", 
     "password" : "PASSWORD", 
     "sql" : "select * from mytable order by creation_time desc", 
     "poll" : "5d", 
     "versioning" : false 
    }, 
    "index" : { 
     "index" : "myindex", 
     "type" : "mytype", 
     "bulk_size" : 500, 
     "bulk_timeout" : "240s" 
    } 
}' 

系統屬性:

16gb RAM 
200gb disk space 

回答