2015-10-31 70 views
0

我在使用Json.net並創建一個大的Bson文件時遇到問題。我有以下測試代碼:OutOfMemory異常與Json.Net中的流和BsonWriter

Imports System.IO 
Imports Newtonsoft.Json 

Public Class Region 
    Public Property Id As Integer 
    Public Property Name As String 
    Public Property FDS_Id As String 
End Class 

Public Class Regions 
    Inherits List(Of Region) 

    Public Sub New(capacity As Integer) 
     MyBase.New(capacity) 
    End Sub 
End Class 

Module Module1 
    Sub Main() 
     Dim writeElapsed2 = CreateFileBson_Stream(GetRegionList(5000000)) 
     GC.Collect(0) 
    End Sub 

    Public Function GetRegionList(count As Integer) As List(Of Region) 
     Dim regions As New Regions(count - 1) 
     For lp = 0 To count - 1 
      regions.Add(New Region With {.Id = lp, .Name = lp.ToString, .FDS_Id = lp.ToString}) 
     Next 
     Return regions 
    End Function 

    Public Function CreateFileBson_Stream(regions As Regions) As Long 
     Dim sw As New Stopwatch 
     sw.Start() 
     Dim lp = 0 

     Using stream = New StreamWriter("c:\atlas\regionsStream.bson") 
      Using writer = New Bson.BsonWriter(stream.BaseStream) 
       writer.WriteStartArray() 

       For Each item In regions 
        writer.WriteStartObject() 
        writer.WritePropertyName("Id") 
        writer.WriteValue(item.Id) 
        writer.WritePropertyName("Name") 
        writer.WriteValue(item.Name) 
        writer.WritePropertyName("FDS_Id") 
        writer.WriteValue(item.FDS_Id) 
        writer.WriteEndObject() 

        lp += 1 
        If lp Mod 1000000 = 0 Then 
         writer.Flush() 
         stream.Flush() 
         stream.BaseStream.Flush() 
        End If 
       Next 

       writer.WriteEndArray() 
      End Using 
     End Using 

     sw.Stop() 
     Return sw.ElapsedMilliseconds 
    End Function 
End Module 

我在第一個using語句中使用了FileStream而不是StreamWriter,它沒有區別。

CreateBsonFile_Stream在出現OutOfMemory異常的超過300萬條記錄時失敗。在Visual Studio中使用內存分析器顯示內存繼續攀升,即使我正在沖洗我所能做的一切。

5m區域的列表在內存中約爲468Mb。

有趣的是,如果我用下面的代碼產生的Json它的工作原理和內存有500MB statys穩定:

Public Function CreateFileJson_Stream(regions As Regions) As Long 
     Dim sw As New Stopwatch 
     sw.Start() 
     Using stream = New StreamWriter("c:\atlas\regionsStream.json") 
      Using writer = New JsonTextWriter(stream) 
       writer.WriteStartArray() 

       For Each item In regions 
        writer.WriteStartObject() 
        writer.WritePropertyName("Id") 
        writer.WriteValue(item.Id) 
        writer.WritePropertyName("Name") 
        writer.WriteValue(item.Name) 
        writer.WritePropertyName("FDS_Id") 
        writer.WriteValue(item.FDS_Id) 
        writer.WriteEndObject() 
       Next 

       writer.WriteEndArray() 
      End Using 
     End Using 
     sw.Stop() 
     Return sw.ElapsedMilliseconds 
    End Function 

我敢肯定這是與BsonWriter問題,但看不出還有什麼我可以。有任何想法嗎?

回答

-1

發現它--BsonWriter試圖成爲'智能'...因爲我將json生成爲一個區域數組,它似乎將整個數組保存在內存中,而不管你做什麼刷新。

爲了證明這一點,我拿出了開始和結束數組寫入並運行例程 - 內存使用率保持在500Mb,程序正常運行。

我的猜測是,這是得到固定在JsonWriter但不是在使用BsonWriter

2

按照BSON specification較小的錯誤,每一個對象或數組 - 所謂文件標準 - 必須包含在開始包括所述文檔中的總字節數的計數:

document ::=  int32 e_list "\x00"  BSON Document. int32 is the total number of bytes comprising the document. 
e_list  ::=  element e_list 
    | "" 
element  ::=  "\x01" e_name double 64-bit binary floating point 
    | "\x02" e_name string UTF-8 string 
    | "\x03" e_name document Embedded document 
    | "\x04" e_name document Array 
    | ... 

因此寫入的根對象或數組時,將被寫入到文件的字節的總數必須預先計算。

Json.NET的BsonWriter和基本​​通過緩存所有tokens寫入在樹上,然後當根令牌的內容已經定稿,寫樹之前遞歸地計算尺寸實現這一點。 (替代方法是使應用程序(即您的代碼)以某種方式預先計算此信息 - 實際上不可能 - 或者在輸出流中來回查找以寫入此信息,可能僅適用於那些Stream.CanSeek == true。)的流。

在您的初始實現中,數組是根BSON文檔,所以Json.NET必須緩存整個數組內容以計算它們的大小。在你的第二個實現中,你實際上是在文件中寫入多個根BSON文檔。這避免了計算總體字節數的需要,但可能不被認爲是有效的BSON;一些BSON閱讀器只會加載第一個文檔,請參閱Insert multiple BSonDocuments from file into MongoDB

更新

基於BsonBinaryWriter我已經創建了一個逐步序列化的枚舉到流的輔助方法,其Stream.CanSeek == true。它不需要在內存中緩存整個BSON文檔,而是尋求流的開始以寫入最終的字節數。由於Json.NET是用c#編寫的,而且我的主要語言是c#,所以這也在c#中。如果你需要這個轉換爲VB.NET,讓我知道,我可以嘗試。

public static class BsonExtensions 
{ 
    public static void SerializeEnumerable<T>(IEnumerable<T> enumerable, Stream stream, JsonSerializerSettings settings = null) 
    { 
     // Adapted from https://github.com/JamesNK/Newtonsoft.Json/blob/master/Src/Newtonsoft.Json/Bson/BsonBinaryWriter.cs 
     if (enumerable == null || stream == null) 
      throw new ArgumentNullException("enumerable == null || stream == null"); 
     if (!stream.CanSeek || !stream.CanWrite) 
      throw new ArgumentException("!stream.CanSeek || !stream.CanWrite"); 

     var serializer = JsonSerializer.CreateDefault(settings); 
     var contract = serializer.ContractResolver.ResolveContract(typeof(T)); 
     BsonType rootType; 
     if (contract is JsonObjectContract) 
      rootType = BsonType.Object; 
     else if (contract is JsonArrayContract) 
      rootType = BsonType.Array; 
     else 
      throw new ArgumentException(string.Format("\"{0}\" maps to neither a BSON object nor a BSON array", typeof(T).FullName)); 

     stream.Flush(); // Just in case. 
     var initialPosition = stream.Position; 
     var writer = new BinaryWriter(stream); // Do NOT dispose, leave the incoming Stream open for the caller to dispose if desired. 

     writer.Write((int)0); // CALCULATED SIZE TO BE CALCULATED LATER. 

     ulong index = 0; 
     var buffer = new byte[256]; 
     foreach (var item in enumerable) 
     { 
      writer.Write((sbyte)rootType); 
      WriteString(writer, index.ToString(CultureInfo.InvariantCulture), buffer); 
      using (var bsonWriter = new BsonWriter(writer) { CloseOutput = false }) 
      { 
       serializer.Serialize(bsonWriter, item); 
      } 
      index++; 
     } 

     writer.Write((byte)0); 
     writer.Flush(); 

     var finalPosition = stream.Position; 
     stream.Position = initialPosition; 
     writer.Write(checked((int)(finalPosition - initialPosition))); 
     stream.Position = finalPosition; 
    } 

    private static readonly Encoding Encoding = new UTF8Encoding(false); 

    private static void WriteString(BinaryWriter writer, string s, byte[] buffer) 
    { 
     if (s != null) 
     { 
      if (s.Length < buffer.Length/Encoding.GetMaxByteCount(1)) 
      { 
       var byteCount = Encoding.GetBytes(s, 0, s.Length, buffer, 0); 
       writer.Write(buffer, 0, byteCount); 
      } 
      else 
      { 
       byte[] bytes = Encoding.GetBytes(s); 
       writer.Write(bytes); 
      } 
     } 

     writer.Write((byte)0); 
    } 
} 

internal enum BsonType : sbyte 
{ 
    // Taken from https://github.com/JamesNK/Newtonsoft.Json/blob/master/Src/Newtonsoft.Json/Bson/BsonType.cs 
    Number = 1, 
    String = 2, 
    Object = 3, 
    Array = 4, 
    Binary = 5, 
    Undefined = 6, 
    Oid = 7, 
    Boolean = 8, 
    Date = 9, 
    Null = 10, 
    Regex = 11, 
    Reference = 12, 
    Code = 13, 
    Symbol = 14, 
    CodeWScope = 15, 
    Integer = 16, 
    TimeStamp = 17, 
    Long = 18, 
    MinKey = -1, 
    MaxKey = 127 
} 

您可以使用該序列化到本地FileStreamMemoryStream - 但不是,比方說,一個DeflateStream,不能被重新定位。

+0

@Liam - 回答更新可能的解決方案。 – dbc