用64位進程讀取文本文件非常慢

我正在將文本文件（.itf）與位於文件夾中的某些邏輯進行合併。當我將它編譯爲32位（控制檯應用程序，.Net 4.6）時，一切正常，但如果文件夾中有大量數據，則會得到outofmemory異常。將其編譯爲64位可以解決這個問題，但與32位進程相比，它運行速度非常慢（超過15倍）。用64位進程讀取文本文件非常慢

我試過BufferedStream和ReadAllLines，但兩者表現都很差。分析器告訴我這些方法佔用了99％的時間。我不知道是問題是...

下面的代碼：

private static void readData(Dictionary<string, Topic> topics) 
{ 
    foreach (string file in Directory.EnumerateFiles(Path, "*.itf")) 
    { 
     Topic currentTopic = null; 
     Table currentTable = null; 
     Object currentObject = null; 
     using (var fs = File.Open(file, FileMode.Open)) 
     { 
      using (var bs = new BufferedStream(fs)) 
      { 
       using (var sr = new StreamReader(bs, Encoding.Default)) 
       { 
        string line; 
        while ((line = sr.ReadLine()) != null) 
        { 
         if (line.IndexOf("ETOP") > -1) 
         { 
          currentTopic = null; 
         } 
         else if (line.IndexOf("ETAB") > -1) 
         { 
          currentTable = null; 
         } 
         else if (line.IndexOf("ELIN") > -1) 
         { 
          currentObject = null; 
         } 
         else if (line.IndexOf("MTID") > -1) 
         { 
          MTID = line.Replace("MTID ", ""); 
         } 
         else if (line.IndexOf("MODL") > -1) 
         { 
          MODL = line.Replace("MODL ", ""); 
         } 
         else if (line.IndexOf("TOPI") > -1) 
         { 
          var name = line.Replace("TOPI ", ""); 
          if (topics.ContainsKey(name)) 
          { 
           currentTopic = topics[name]; 
          } 
          else 
          { 
           var topic = new Topic(name); 
           currentTopic = topic; 
           topics.Add(name, topic); 
          } 
         } 
         else if (line.IndexOf("TABL") > -1) 
         { 
          var name = line.Replace("TABL ", ""); 
          if (currentTopic.Tables.ContainsKey(name)) 
          { 
           currentTable = currentTopic.Tables[name]; 
          } 
          else 
          { 
           var table = new Table(name); 
           currentTable = table; 
           currentTopic.Tables.Add(name, table); 
          } 
         } 
         else if (line.IndexOf("OBJE") > -1) 
         { 
          if (currentTable.Name != "Metadata" || currentTable.Objects.Count == 0) 
          { 
           var shortLine = line.Replace("OBJE ", ""); 
           var obje = new Object(shortLine.Substring(shortLine.IndexOf(" "))); 
           currentObject = obje; 
           currentTable.Objects.Add(obje); 
          } 
         } 
         else if (currentTopic != null && currentTable != null && currentObject != null) 
         { 
          currentObject.Data.Add(line); 
         } 
        } 
       } 
      } 
     } 
    } 
}

來源

2015-09-30 Chris

那麼Profiler所說的ReadAllLines在哪裏放慢速度？另外，你的瓶頸很可能是由於'string.IndexOf'。提示：投資創建一個合適的詞法分析器/解析器。 – leppie

我想知道是否字符串分配的數量（所有這些調用'.Replace'創建新字符串）是罪魁禍首 - 一個真正的分析器可能會告訴，但我想知道是否一個機制，將整個文件作爲流並讀取字符沒有任何修改/操作行的字符將是更好的解決方案。 –

代碼示例顯示了「BufferedStream」版本。我也有一個'ReadAllLines'。在32位分析器中確實表示'Replace'和'IndexOf'方法消耗大量時間。不過，我想知道爲什麼64位版本要慢得多。 – Chris

一些提示：

爲什麼使用File.Open，然後BufferedStream然後StreamReader時你可以用StreamReader來完成這項工作，該工作被緩衝了嗎？
你應該對你的條件進行重新排序，使其發生的次數更多。
請考慮讀取所有行，然後使用Parallel.ForEach

來源

2015-09-30 08:50:54

thx爲您的提示，我實現了他們。雖然並行性在我的情況下不起作用，但由於內容的模型，我必須依次解析它們。 – Chris

我可以解決這個問題。似乎.Net編譯器中存在一個錯誤。刪除VS2015中的代碼優化複選框會導致性能提升。現在，它運行的性能與32位版本相似。我的最終版本有一些優化：

private static void readData(ref Dictionary<string, Topic> topics) 
    { 
     Regex rgxOBJE = new Regex("OBJE [0-9]+ ", RegexOptions.IgnoreCase | RegexOptions.Compiled); 
     Regex rgxTABL = new Regex("TABL ", RegexOptions.IgnoreCase | RegexOptions.Compiled); 
     Regex rgxTOPI = new Regex("TOPI ", RegexOptions.IgnoreCase | RegexOptions.Compiled); 
     Regex rgxMTID = new Regex("MTID ", RegexOptions.IgnoreCase | RegexOptions.Compiled); 
     Regex rgxMODL = new Regex("MODL ", RegexOptions.IgnoreCase | RegexOptions.Compiled); 
     foreach (string file in Directory.EnumerateFiles(Path, "*.itf")) 
     { 
      if (file.IndexOf("itf_merger_result") == -1) 
      { 
       Topic currentTopic = null; 
       Table currentTable = null; 
       Object currentObject = null; 
       using (var sr = new StreamReader(file, Encoding.Default)) 
       { 
        Stopwatch sw = new Stopwatch(); 
        sw.Start(); 
        Console.WriteLine(file + " read, parsing ..."); 
        string line; 
        while ((line = sr.ReadLine()) != null) 
        { 
         if (line.IndexOf("OBJE") > -1) 
         { 
          if (currentTable.Name != "Metadata" || currentTable.Objects.Count == 0) 
          { 
           var obje = new Object(rgxOBJE.Replace(line, "")); 
           currentObject = obje; 
           currentTable.Objects.Add(obje); 
          } 
         } 
         else if (line.IndexOf("TABL") > -1) 
         { 
          var name = rgxTABL.Replace(line, ""); 
          if (currentTopic.Tables.ContainsKey(name)) 
          { 
           currentTable = currentTopic.Tables[name]; 
          } 
          else 
          { 
           var table = new Table(name); 
           currentTable = table; 
           currentTopic.Tables.Add(name, table); 
          } 
         } 
         else if (line.IndexOf("TOPI") > -1) 
         { 
          var name = rgxTOPI.Replace(line, ""); 
          if (topics.ContainsKey(name)) 
          { 
           currentTopic = topics[name]; 
          } 
          else 
          { 
           var topic = new Topic(name); 
           currentTopic = topic; 
           topics.Add(name, topic); 
          } 
         } 
         else if (line.IndexOf("ETOP") > -1) 
         { 
          currentTopic = null; 
         } 
         else if (line.IndexOf("ETAB") > -1) 
         { 
          currentTable = null; 
         } 
         else if (line.IndexOf("ELIN") > -1) 
         { 
          currentObject = null; 
         } 
         else if (currentTopic != null && currentTable != null && currentObject != null) 
         { 
          currentObject.Data.Add(line); 
         } 
         else if (line.IndexOf("MTID") > -1) 
         { 
          MTID = rgxMTID.Replace(line, ""); 
         } 
         else if (line.IndexOf("MODL") > -1) 
         { 
          MODL = rgxMODL.Replace(line, ""); 
         } 
        } 
        sw.Stop(); 
        Console.WriteLine(file + " parsed in {0}s", sw.ElapsedMilliseconds/1000.0); 
       } 
      } 
     } 
    }

來源

2015-09-30 12:23:03 Chris

另一個RyuJIT錯誤？ – leppie

與你的程序的最大問題是，當你讓它在64位模式下運行，那麼它可以讀取更多的文件。這很好，一個64位進程的地址空間比32位進程多一千倍，用完它不太可能。

但是，你沒有得到更多的內存。

工作中「沒有免費午餐」的普遍原則。有足夠的內存在這樣的程序中很重要。首先，它由文件系統緩存使用。神奇的操作系統功能，使它看起來像就像從磁盤讀取文件是非常便宜。這完全不是你在程序中可以做的最慢的事情之一，但它非常善於隱藏它。當你運行你的程序不止一次時，你會調用它。第二次以後，你根本不會從磁盤讀取數據。這是一個非常危險的特性，當你測試你的程序時很難避免，你會得到很不切實際的關於它的效率的假設。

64位進程的問題是它很容易使文件系統緩存失效。由於您可以讀取更多文件，因此壓倒了緩存。並刪除舊的文件數據。現在你第二次運行你的程序時，它不會再快了。您讀取的文件將而不是不再在緩存中，但必須從磁盤讀取。您現在將看到您的程序的perf，它將在生產中表現的方式。這是一件好事，儘管你不喜歡它:)

RAM的次要問題是較小的一個，如果你分配大量的內存來存儲文件數據，那麼你會強制操作系統找到RAM來存儲它。這可能會導致很多硬頁面錯誤，當它必須取消映射另一個進程使用的內存或您的內存時，纔會釋放所需的RAM。一個稱爲「顛簸」的通用問題。頁面故障是你可以在任務管理器中看到的東西，使用視圖>選擇列來添加它。

鑑於文件系統緩存是緩存的最可能來源，您可以執行一個簡單的測試，即重新啓動您的計算機，以確保緩存不具有任何文件數據，然後運行32位緩存， bit版本。預測它也會很慢，BufferedStream和ReadAllLines是瓶頸。就像他們應該。

最後一點需要注意的是，即使程序與模式不匹配，您仍然無法對.NET 4.6性能問題做出有力的假設。直到this very nasty bug得到修復。

來源

2015-09-30 12:44:50

刪除代碼優化複選框通常應導致性能下降，而不是加速。 VS 2015產品可能存在問題。請爲您的程序提供一個獨立的數據恢復案例，其中包含一個輸入集，用於演示性能問題和報告：http://connect.microsoft.com/

來源

2015-11-07 01:08:01

用64位進程讀取文本文件非常慢

回答

相關問題