如何在Node.js中找到具有特定數據的文件？

我通過重寫C＃中的一些實用工具來了解Node.js，以瞭解它的趣味性。我發現了一些不是用Node.js編寫的好主意，或者我完全錯過了一個能使它工作的概念。如何在Node.js中找到具有特定數據的文件？

程序的目標：搜索文件的目錄，找到符合某些條件的數據的文件。這些文件是壓縮XML文件，目前我只是在尋找一個標籤。這是我試過（files是文件名的數組）：

while (files.length > 0) { 
    var currentPath = rootDir + "\\" + files.pop(); 
    var fileContents = fs.readFileSync(currentPath); 
    zlib.gunzip(fileContents, function(err, buff) { 
     if (buff.toString().indexOf("position") !== -1) { 
      console.log("The file '%s' has an odometer reading.", currentPath); 
      return; 
     } 
    });  

    if (files.length % 1000 === 0) { 
     console.log("%d files remain...", files.length); 
    } 
}

我很緊張，這個時候我寫的。從控制檯輸出中可以清楚看出，所有gunzip操作都是異步的，並決定等待while循環完成。這意味着當我最終得到一些輸出時，currentPath沒有它讀取文件時的值，所以程序沒用。我沒有看到用zlip模塊解壓縮數據的同步方式。我沒有看到存儲上下文的方式（currentPath會這樣做），因此回調具有正確的值。我最初嘗試使用流，將文件流傳輸到gunzip流，但是我遇到類似的問題，因爲我的回調在循環完成後發生，我失去了有用的上下文。

這是一個漫長的一天，我不知道如何構建這個。循環是一個同步的東西，我的異步的東西取決於它的狀態。那很不好。我錯過了什麼？如果這些文件沒有被壓縮，由於readFileSync（）會很容易。

來源

2014-01-10 OwenP

哇。我根本沒有期望得到任何答案。我陷入了一段時間的緊張，但我花了最後幾天的時間來看Node.js，推測爲什麼某些事情像他們一樣工作，並學習控制流。

因此，代碼原因不起作用，因爲我需要一個閉包來捕獲currentPath的值。男孩做Node.js喜歡關閉和回調。因此，對於應用程序更好的結構是這樣的：

function checkFile(currentPath, fileContents) { 
    var fileContents = fs.readFileSync(currentPath); 
    zlib.gunzip(fileContents, function(err, buff) { 
     if (buff.toString().indexOf("position") !== -1) { 
      console.log("The file '%s' has an odometer reading.", currentPath); 
      return; 
     } 
    }); 
} 

while (files.length > 0) { 
    var currentPath = rootDir + "\\" + files.shift(); 
    checkFile(currentPath); 

}

但事實證明，這不是很節點，因爲有這麼多的同步代碼。要做到這一點，我需要依靠更多的回調。該方案竟然超過我的預期，所以我就只發布它的一部分爲簡潔，但它的第一位是這樣的：

function checkForOdometer(currentPath, callback) { 
    fs.readFile(currentPath, function(err, data) { 
     unzipFile(data, function(hasReading) { 
      callback(currentPath, hasReading); 
     }); 
    }); 
} 

function scheduleCheck(filePath, callback) { 
    process.nextTick(function() { 
     checkForOdometer(filePath, callback); 
    }); 
} 

var withReading = 0; 
var totalFiles = 0; 
function series(nextPath) { 
    if (nextPath) { 
     var fullPath = rootDir + nextPath; 
     totalFiles++; 
     scheduleCheck(fullPath, function(currentPath, hasReading) { 
      if (hasReading) { 
       withReading++; 
       console.log("%s has a reading.", currentPath); 
      } 

      series(files.shift()); 
     }); 
    } else { 
     console.log("%d files searched.", totalFiles); 
     console.log("%d had a reading.", withReading); 
    } 
} 

series(files.shift());

的原因一系列控制流程似乎如果我設置起來，我最終耗盡進程的內存明顯的並行搜索，可能從具有60,000緩衝區價值的數據坐在堆棧上：

while (files.length > 0) { 
    var currentPath = rootDir + files.shift(); 
    checkForOdometer(currentPath, function(callbackPath, hasReading) { 
     //... 
    }); 
}

我大概可以設置它來安排的，比如批，在50個文件並行並等待安排50個以上的時間。設置系列控制流程似乎也很簡單。

來源

2014-01-15 23:26:29 OwenP

如何在Node.js中找到具有特定數據的文件？

回答

相關問題