我已經得到了一個應用程序,其中輸入已經從50K位置記錄放大到1.1百萬位置記錄。 由於整個文件先前已反序列化爲單個對象,因此造成嚴重問題。 該對象的大小爲〜1GB,用於生成具有110萬條記錄的文件。 由於大對象GC問題,我想將反序列化的對象保持在85K以下。使用JSON.NET解析1GB json文件的問題
我試圖一次解析出一個位置對象,並將其反序列化,以便我可以控制對象數量 ,這些對象被反序列化並反過來控制對象的大小。我正在使用Json.Net庫來執行此操作。
下面是我作爲流接收到我的應用程序中的JSON文件的示例。
{
"Locations": [{
"LocationId": "",
"ParentLocationId": "",
"DisplayFlag": "Y",
"DisplayOptions": "",
"DisplayName": "",
"Address": "",
"SecondaryAddress": "",
"City": "",
"State": "",
"PostalCode": "",
"Country": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"LatLonQuality": 99,
"BusinessLogoUrl": "",
"BusinessUrl": "",
"DisplayText": "",
"PhoneNumber": "",
"VenueGroup": 7,
"VenueType": 0,
"SubVenue": 0,
"IndoorFlag": "",
"OperatorDefined": "",
"AccessPoints": [{
"AccessPointId": "",
"MACAddress": "",
"DisplayFlag": "",
"DisplayOptions": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"Status": "Up",
"OperatorDefined": "",
"RoamingGroups": [{
"GroupName": ""
},
{
"GroupName": ""
}],
"Radios": [{
"RadioId": "",
"RadioFrequency": "",
"RadioProtocols": [{
"Protocol": ""
}],
"WifiConnections": [{
"BSSID": "",
"ServiceSets": [{
"SSID": "",
"SSID_Broadcasted": ""
}]
}]
}]
}]
},
{
"LocationId": "",
"ParentLocationId": "",
"DisplayFlag": "Y",
"DisplayOptions": "",
"DisplayName": "",
"Address": "",
"SecondaryAddress": "",
"City": "",
"State": "",
"PostalCode": "",
"Country": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"LatLonQuality": 99,
"BusinessLogoUrl": "",
"BusinessUrl": "",
"DisplayText": "",
"PhoneNumber": "",
"VenueGroup": 7,
"VenueType": 0,
"SubVenue": 0,
"IndoorFlag": "",
"OperatorDefined": "",
"AccessPoints": [{
"AccessPointId": "",
"MACAddress": "",
"DisplayFlag": "",
"DisplayOptions": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"Status": "Up",
"OperatorDefined": "",
"RoamingGroups": [{
"GroupName": ""
},
{
"GroupName": ""
}],
"Radios": [{
"RadioId": "",
"RadioFrequency": "",
"RadioProtocols": [{
"Protocol": ""
}],
"WifiConnections": [{
"BSSID": "",
"ServiceSets": [{
"SSID": "",
"SSID_Broadcasted": ""
}]
}]
}]
}]
}]
}
我需要能夠拉出個別位置的對象,這樣我就可以查看以下
{
"LocationId": "",
"ParentLocationId": "",
"DisplayFlag": "Y",
"DisplayOptions": "",
"DisplayName": "",
"Address": "",
"SecondaryAddress": "",
"City": "",
"State": "",
"PostalCode": "",
"Country": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"LatLonQuality": 99,
"BusinessLogoUrl": "",
"BusinessUrl": "",
"DisplayText": "",
"PhoneNumber": "",
"VenueGroup": 7,
"VenueType": 0,
"SubVenue": 0,
"IndoorFlag": "",
"OperatorDefined": "",
"AccessPoints": [{
"AccessPointId": "",
"MACAddress": "",
"DisplayFlag": "",
"DisplayOptions": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"Status": "Up",
"OperatorDefined": "",
"RoamingGroups": [{
"GroupName": ""
},
{
"GroupName": ""
}],
"Radios": [{
"RadioId": "",
"RadioFrequency": "",
"RadioProtocols": [{
"Protocol": ""
}],
"WifiConnections": [{
"BSSID": "",
"ServiceSets": [{
"SSID": "",
"SSID_Broadcasted": ""
}]
}]
}]
}]
}
我試圖使用Json.NET JsonTextReader做到這一點,但是我不能讓讀者在緩衝區中包含整個位置,這是由於流中記錄的大小,讀者最初將會下降到「RadioProtocols」,該時間位於對象的中間位置流到達對象的末尾,讀者已經丟棄了對象的開始。
我使用的是嘗試讓此功能工作的代碼是
var ser = new JsonSerializer();
using (var reader = new JsonTextReader(new StreamReader(stream)))
{
reader.SupportMultipleContent = true;
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject && reader.Depth == 2)
{
do
{
reader.Read();
} while (reader.TokenType != JsonToken.EndObject && reader.Depth == 2);
var singleLocation = ser.Deserialize<Locations>(reader);
}
}
}
在這個或任何信息,這樣做將不勝感激的替代品。作爲一個附註,我們的客戶發送信息的方式目前無法改變。
這聽起來像是你將不得不推出自己的序列化JSON,因爲那json.NET是要反序列化的最小單位合理會引起你一個'OutOfMemoryException'。這就是說我認爲這完全是錯誤的做法。我會解決更大的問題,這顯然是你笨重的數據源或硬件不足。 – evanmcdonnal
不幸的是,我們現在無法改變方法,我們基本上被告知只打補丁,或者更準確地說,「只是讓它工作而不會改變太多」 – polydegmon
我試着運行你的代碼,但是我發現了一個問題。假設'Locations'類型對應'Locations'數組中的一個條目,則代碼會拋出一個異常,因爲讀者被錯誤地放置在''LocationId''屬性中。想法是通過'Locations'數組中的每個條目來枚舉,逐個加載每個條目? – dbc