我正在處理一個政治競選捐款的數據集,該數據集最終成爲一個大約500MB的JSON文件(最初是一個124MB CSV)。在Firebase網絡界面導入(嘗試崩潰Google Chrome上的標籤之前)太大了。我試圖手動上傳對象,因爲它們是從CSV製作的(使用CSVtoJSON轉換器,每一行都成爲JSON對象,然後我會將該對象上傳到Firebase)。將大量數據導入Firebase數據庫的正確方法是什麼?
這是我使用的代碼。
var firebase = require('firebase');
var Converter = require("csvtojson").Converter;
firebase.initializeApp({
serviceAccount: "./credentials.json",
databaseURL: "url went here"
});
var converter = new Converter({
constructResult:false,
workerNum:4
});
var db = firebase.database();
var ref = db.ref("/");
var lastindex = 0;
var count = 0;
var section = 0;
var sectionRef;
converter.on("record_parsed",function(resultRow,rawRow,rowIndex){
if (rowIndex >= 0) {
sectionRef = ref.child("reports" + section);
var reportRef = sectionRef.child(resultRow.Report_ID);
reportRef.set(resultRow);
console.log("Report uploaded, count at " + count + ", section at " + section);
count += 1;
lastindex = rowIndex;
if (count >= 1000) {
count = 0;
section += 1;
}
if (section >= 100) {
console.log("last completed index: " + lastindex);
process.exit();
}
} else {
console.log("we out of indices");
process.exit();
}
});
var readStream=require("fs").createReadStream("./vUPLOAD_MASTER.csv");
readStream.pipe(converter);
但是,這會遇到內存問題並且無法完成數據集。由於Firebase沒有顯示上傳的所有數據,因此試圖以大塊的方式執行操作並不可行,而且我也不確定從哪裏離開。 (當離開火力地堡數據庫在Chrome中打開,我看到的數據進來,但最終的標籤會崩潰,並在重裝了很多後來的數據的缺失。)
然後我用Firebase Streaming Import試過,但拋出這個錯誤:
started at 1469471482.77
Traceback (most recent call last):
File "import.py", line 90, in <module>
main(argParser.parse_args())
File "import.py", line 20, in main
for prefix, event, value in parser:
File "R:\Python27\lib\site-packages\ijson\common.py", line 65, in parse
for event, value in basic_events:
File "R:\Python27\lib\site-packages\ijson\backends\python.py", line 185, in basic_parse
for value in parse_value(lexer):
File "R:\Python27\lib\site-packages\ijson\backends\python.py", line 127, in parse_value
raise UnexpectedSymbol(symbol, pos)
ijson.backends.python.UnexpectedSymbol: Unexpected symbol u'\ufeff' at 0
迎向了那最後一行(從ijson錯誤),我發現this SO thread,但我只是不知道我應該如何使用它來獲取火力地堡流導入工作。
我刪除使用議會從JSON文件,我要上傳的字節順序標記,現在我一分鐘左右的時間裏運行的進口商得到這個錯誤:
Traceback (most recent call last):
File "import.py", line 90, in <module>
main(argParser.parse_args())
File "import.py", line 20, in main
for prefix, event, value in parser:
File "R:\Python27\lib\site-packages\ijson\common.py", line 65, in parse
for event, value in basic_events:
File "R:\Python27\lib\site-packages\ijson\backends\python.py", line 185, in basic_parse
for value in parse_value(lexer):
File "R:\Python27\lib\site-packages\ijson\backends\python.py", line 116, in parse_value
for event in parse_array(lexer):
File "R:\Python27\lib\site-packages\ijson\backends\python.py", line 138, in parse_array
for event in parse_value(lexer, symbol, pos):
File "R:\Python27\lib\site-packages\ijson\backends\python.py", line 119, in parse_value
for event in parse_object(lexer):
File "R:\Python27\lib\site-packages\ijson\backends\python.py", line 170, in parse_object
pos, symbol = next(lexer)
File "R:\Python27\lib\site-packages\ijson\backends\python.py", line 51, in Lexer
buf += data
MemoryError
的火力地堡流進口商應該能夠處理250MB以上的文件,並且我相當肯定我有足夠的內存來處理這個文件。任何想法爲什麼這個錯誤出現?
如果看到實際的JSON文件,我嘗試使用Firebase Streaming Import上傳會有所幫助,here it is。