嘗試使用打開細化來分析凌亂JSON字符串(40k行)的數據集,但是由於JSON的無序性質,一些JSON對象的行在返回並記錄到文件時被混淆了。嚴格使用JSON,如何將鍵值重新排序爲特定的JSON模式以進行打開細化
某些對象缺少鍵,某些對象的順序不正確。例如:
1 {"about":"foo", "category":"bar", "id":"123", "cat_list": ["category1":"foo2"]}
2 {"id":"22","about":"barFoo", "category":"NotABar"}
3 {"about":"barbar", "category":"website", "id":"3333", "cat_list": ["category1":"foo22"]}
....
....
....
40,000 {"about":"bar123", "category":"publish", "id":"3323", "cat_list": ""}
ISSUE:
的數據導入打開提純,程序要求一個特定的模式進行比較,以當它讀取該文件。然後它讀取提供的文件,將線上的每個JSON對象與模式以及導入或放棄進行比較,具體取決於它與模式的匹配程度!結果許多條目被排除在外!
理想的情況:
使用Python,我想重新排序的JSON對象到我指定一個特定的模式。
例子:
指定模式
{"about":"", "category":"", "id":"", "cat_list": ""}
然後將重新排列JSON和它的鍵值的每一行是在這個特定的格式:
1 {"about": ....
2 {"about": ....
3 {"about": ....
....
....
....
40,000 {"about": ....
我不完全確定我如何有效地做到這一點?
編輯:
我決定寫一個腳本來組織這個。我刪除了一些複雜的字段並且有一個完整的.JSON文件:
{"name":"Carstar Bridgewater",
"category":"Automotive",
"about":"We are Bridgewaters largest professional collision centre and are committed to being there for customer cars and communities when they need us.",
"country":"Canada",
"state":"NS",
"city":"Bridgewater
"},
{"name":"Febreze",
"category":"Product/Service
",
"about":"Freshness that eliminates odorsso you can breathe happy.",
"country":"Added Nothing",
"state":"Added Nothing",
"city":"Added Nothing"},
{"name":"Custom Wood & Acrylic Turnings",
"category":"Professional Services",
"about":"Hand crafted item turned on a wood lath pen pencil bottle stopper cork screw bottle opener perfume applicator or other custom turnings",
"country":"Canada",
"state":"NS
",
"city":"Middle Sackville"},
{"name":"The Hunger Games",
"category":"Movie
",
"about":"THE HUNGER GAMES: MOCKINGJAY - PART 1 - In theatres November 2 2014. www.hungergamesmovie.ca",
"country":"Added Nothing",
"state":"Added Nothing",
"city":"Added Nothing"},
然而, Google-Refine仍然拒絕接受我的文件?我做錯了什麼?
對象在JSON沒有內在順序,只有陣列做。 – Barmar
您的'cat_list'值不是有效的JSON。數組不能包含像這樣的'key:value'對。在40,000行上,該值是一個字符串而不是數組,可能違反了模式。我認爲你遇到的問題與這些問題有關,而不是對象中元素的順序。 – Barmar
正如@Barmar所說,您的問題可能不是訂購相關的。 ...但是如果你使用的是普通的'json'模塊,那麼它只是按照dict.items()/ dict.iteritems()提供的順序排序鍵,除非你讓它排序。你可以使用一個collections.OrderedDict來記住插入順序,或者製作一個字典封裝器,它可以按你想要的順序返回鍵。 – Wuggy