我對Node相當陌生,我對內存泄漏並不十分熟悉,但我相信我有。我有一個簡單的Node/Express應用程序,允許用戶上傳PDF文件(每個文章最多可以有10,000個文件)。當文件被上傳他們用貓鼬保存到MongoDB中,以如下的路線(多線略)(請注意,我用Multer得到req.files
):節點pdf2json內存泄漏?
app.post('/articles', upload.array('pdfs', 10000), (req, res) => {
req.files.forEach(function(file) {
var newArticle = new Article(file);
newArticle.save();
}
));
此代碼正確執行,我可以在單個POST中向數據庫添加10,000個文件。接下來,我使用貓鼬中間件解析PDF文件pdf2json,並在save
之後向DB文檔添加text
字段。型號如下:
'use strict'
var mongoose = require('mongoose');
var Schema = mongoose.Schema;
var PDFParser = require("pdf2json");
var articleSchema = new Schema({
originalname: String,
normalizedName: String,
filename: String,
mimetype: String,
text: String,
processed: Boolean,
pageCount: Number,
createdAt: Date
});
var pdfParser = new PDFParser();
articleSchema.post('save', function(article) {
if(!article.processed) {
pdfParser.on("pdfParser_dataError", errData => {
console.log('pdfParser Error');
console.error(errData);
});
pdfParser.on("pdfParser_dataReady", pdfData => {
article.text = pdfParser.getRawTextContent();
article.processed = true;
article.save(function (err, article) {
//if (err) res.send(err)
//console.log(err);
console.log('Text parsed: ' + article.originalname);
});
});
pdfParser.loadPDF(__dirname + "/../public/uploads/" + article.filename);
}
});
module.exports = mongoose.model('Article', articleSchema);
以這種方式解析PDF文件似乎是造成內存泄漏。在解析PDF時,我可以使用node --trace_gc來觀察內存爬升。當我上傳〜50個典型的PDF文檔時,所有內容都會運行文件,但是當我嘗試一次上傳〜100時,應用程序崩潰,並且出現「javascript堆內存不足」錯誤。我需要能夠一次上傳10,000個PDF文件。
[31819:0x102800000] 206275 ms: Mark-sweep 394.5 (494.5) -> 394.5 (494.5) MB, 432.5/0 ms [last resort gc].
<--- Last few GCs --->
204991 ms: Mark-sweep 395.7 (494.5) -> 394.5 (494.5) MB, 431.3/0 ms [allocation failure] [GC in old space requested].
205410 ms: Mark-sweep 394.5 (494.5) -> 394.5 (494.5) MB, 419.7/0 ms [allocation failure] [GC in old space requested].
205843 ms: Mark-sweep 394.5 (494.5) -> 394.5 (494.5) MB, 432.2/0 ms [last resort gc].
206275 ms: Mark-sweep 394.5 (494.5) -> 394.5 (494.5) MB, 432.5/0 ms [last resort gc].
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 0x24c97a4c9e31 <JS Object>
1: transform(aka ctxTransform) [0x24c97a404189 <undefined>:~40152] [pc=0xd9a435bd068] (this=0x221695b86801 <a CanvasRenderingContext2D_ with map 0x21159eaffc09>,a=0xd844e0103d1 <Number: 8.5>,b=0,c=0,d=0xd844e0103e1 <Number: 8.5>,e=0xd844e0103f1 <Number: 42.0094>,f=0xd844e010401 <Number: 608.882>)
2: showText(aka CanvasGraphics_showText) [0x24c97a404189 <undefined>:~41068] [pc=0xd9a436c8...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
1: node::Abort() [/Users/mattsears/.nvm/versions/node/v6.3.1/bin/node]
2: node::FatalException(v8::Isolate*, v8::Local<v8::Value>, v8::Local<v8::Message>) [/Users/mattsears/.nvm/versions/node/v6.3.1/bin/node]
3: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [/Users/mattsears/.nvm/versions/node/v6.3.1/bin/node]
4: v8::internal::Factory::NewFixedArray(int, v8::internal::PretenureFlag) [/Users/mattsears/.nvm/versions/node/v6.3.1/bin/node]
5: v8::internal::LCodeGenBase::PopulateDeoptimizationData(v8::internal::Handle<v8::internal::Code>) [/Users/mattsears/.nvm/versions/node/v6.3.1/bin/node]
6: v8::internal::LChunk::Codegen() [/Users/mattsears/.nvm/versions/node/v6.3.1/bin/node]
7: v8::internal::OptimizedCompileJob::GenerateCode() [/Users/mattsears/.nvm/versions/node/v6.3.1/bin/node]
8: v8::internal::Compiler::GetConcurrentlyOptimizedCode(v8::internal::OptimizedCompileJob*) [/Users/mattsears/.nvm/versions/node/v6.3.1/bin/node]
9: v8::internal::OptimizingCompileDispatcher::InstallOptimizedFunctions() [/Users/mattsears/.nvm/versions/node/v6.3.1/bin/node]
10: v8::internal::StackGuard::HandleInterrupts() [/Users/mattsears/.nvm/versions/node/v6.3.1/bin/node]
11: v8::internal::Runtime_StackGuard(int, v8::internal::Object**, v8::internal::Isolate*) [/Users/mattsears/.nvm/versions/node/v6.3.1/bin/node]
12: 0xd9a4260961b
[1] 31819 abort node --optimize_for_size --max_old_space_size=460 --trace_gc server.js
該應用程序將在免費的Heroku 512MB測功機上運行,所以我不能只是增加--max-old-space-size
(我不認爲)。我相信我需要一個能減少內存使用量的解決方案。
任何人都可以在這裏發現內存泄漏嗎?其他建議?
請注意,我與Multer或pdf2json沒有任何關係。
我不相信這是一個泄漏。基本上,我認爲你正在將1萬個信封塞入5個信封的郵箱中。你是否熟悉編寫算法,設計CPU和內存等? –
我在想這可能是這種情況。你會如何解決這個問題?我並不是很熟悉隊列,所以我不知道這樣的事情是否會有所幫助。處理所有文件需要多長時間並不重要,我只需要它最終處理它們,而不會超過內存限制。 – mattsears18
看起來像你想出來的。我會說解析上傳,但這就是你所做的。免責聲明:我不知道節點,但我已經編程。 Node是js,js是自動垃圾收集的,所以內存泄漏應該很少。 –