2015-11-02 81 views
1

我們如何導入一個具有500K行以上的表?如果以塊爲單位進行導入將是解決方案,那麼是否有關於將csv數據導入mongodb的教程?Mongoimport csv(> 500K rows/documents)錯誤,通過塊導入csv到mongodb

我試圖導入CSV文件,其中包含2,710,000行,使用以下命令:

mongoimport -d test -c transact --type csv --file transact.csv --headerline 

它給出了一個錯誤:

2015-11-02T12:44:35.420-0500 connected to: localhost 
2015-11-02T12:44:38.419-0500 [........................] test.transact 
11.7 MB/397.5 MB (2.9%) 
2015-11-02T12:44:41.414-0500 [#.......................] test.transact 
22.1 MB/397.5 MB (5.6%) 
2015-11-02T12:44:44.413-0500 [##......................] test.transact 
33.8 MB/397.5 MB (8.5%) 
2015-11-02T12:44:47.414-0500 [##......................] test.transact 
44.0 MB/397.5 MB (11.1%) 
2015-11-02T12:44:50.420-0500 [###.....................] test.transact 
55.3 MB/397.5 MB (13.9%) 
2015-11-02T12:44:53.413-0500 [###.....................] test.transact 
66.1 MB/397.5 MB (16.6%) 
2015-11-02T12:44:55.962-0500 [####....................] test.transact 
73.5 MB/397.5 MB (18.5%) 
2015-11-02T12:45:07.501-0500 Failed: read error on entry #500899: line 500900 
, column 140: extraneous " in field 
2015-11-02T12:45:07.502-0500 imported 500000 documents 

爲什麼只有500K可以裝載到MongoDB的? 我在網上看了:

Maximum Number of Documents Per Chunk to Migrate

MongoDB cannot move a chunk if the number of documents in the chunk exceeds either 250000 documents or 1.3 times the number of average sized documents that the maximum chunk size can hold.

來源: https://docs.mongodb.org/manual/reference/limits/

我也正好碰上了開發商的博客誰也遇到了類似的問題:

Seriously? Seriously? MongoDB dies after about 500,000 documents, silently corrupting my data, not issuing any warnings and then refusing to let me even read it? I’ve never seen such broken behaviour in any other piece of software I’ve used. I went back to the channel, seething (I can’t imagine the guys in there were very happy with providing free support to an angry person, but they were helpful nonetheless), and detailed my predicament. Obviously, the solution would be to reformat my server and install a 64-bit OS if I wanted to have more than 500k documents in the database.

來源: http://www.stavros.io/posts/my-experience-with-using-mongodb-for-great-science/

我們如何導入一個超過500K的表格OWS?如果以塊爲單位進行導入將是解決方案,那麼是否有關於將csv數據導入mongodb的教程?

回答

0

我從來沒有試過使用CSV導入,但我可以向你保證,MongoDB可以處理許多更多的文件。我管理集合超過200M文檔的集羣。

你在這裏混合的概念。 A chunk是用於管理sharded cluster的邏輯單元,並且可以具有任意大小。但是,如果超過某個閾值,它將被視爲Jumbo-Chunk,無法從一個分片遷移到另一個以平衡存儲在每個集羣節點上的數據。塊閾值與MongoDB實例可容納的最大數據完全無關。

而且,請原諒我的法語,你引用的那個人對他所談論的問題一無所知,記錄他自己無法完成最簡單的任務,也看不出他能夠達到read the docs。我們實際上有什麼?一個人咆哮着他不瞭解的東西,也懶得爲他的碎片做好準備M Sc。坦率地說,我在問自己他是如何獲得他的第一名BSc的。那傢伙跑了發展版分支不鼓勵生產和抱怨數據損壞(他應該有一個備份,無論如何,鑑於他是"Systems Administrator and IT-Manager")......我相信信息儘可能我可以扔帝國大廈。 ;)

回到你的問題: 你可能陷入了同樣的陷阱:除非是非常小的測試或概念證明,否則使用32位版本的MongoDB是不建議的。恰恰相反:

When running a 32-bit build of MongoDB, the total storage size for the server, including data and indexes, is 2 gigabytes. For this reason, do not deploy MongoDB to production on 32-bit machines.

所以,首先要確保你做執行MongoDB的32位版本。如有必要更改。

當您運行64位版本的MongoDB時,請繼續。

你可以做的第二件事是絕對確信行500900沒有損壞。只需將其打印出來

sed -n "500900p" your.csv 

然後雙擊和三重檢查輸出。如果仍有問題,請在http://dba.stackoverflow.com

上面輸入 sed以上的命令添加一個新問題