2014-11-14 222 views
19

我有一個由JSON組成的文件,每行一行,並且想通過update_time對文件進行排序。按值排序的json的python排序列表

樣本JSON文件:

{ "page": { "url": "url1", "update_time": "1415387875"}, "other_key": {} } 
{ "page": { "url": "url2", "update_time": "1415381963"}, "other_key": {} } 
{ "page": { "url": "url3", "update_time": "1415384938"}, "other_key": {} } 

想輸出:

{ "page": { "url": "url1", "update_time": "1415387875"}, "other_key": {} } 
{ "page": { "url": "url3", "update_time": "1415384938"}, "other_key": {} } 
{ "page": { "url": "url2", "update_time": "1415381963"}, "other_key": {} } 

我的代碼:

#!/bin/env python 
#coding: utf8 

import sys 
import os 
import json 
import operator 

#load json from file 
lines = [] 
while True: 
    line = sys.stdin.readline() 
    if not line: break 
    line = line.strip() 
    json_obj = json.loads(line) 
    lines.append(json_obj) 

#sort json 
lines = sorted(lines, key=lambda k: k['page']['update_time'], reverse=True) 

#output result 
for line in lines: 
    print line 

代碼工作正常樣本JSON文件,但如果一個JSON沒有'update_time',它會引發KeyError異常。有沒有例外的方法來做到這一點?

回答

17

答案應該是顯而易見的:編寫一個函數,使用try...except來處理KeyError,然後將其用作參數key而不是lambda。

def extract_time(json): 
    try: 
     # Also convert to int since update_time will be string. When comparing 
     # strings, "10" is smaller than "2". 
     return int(json['page']['update_time']) 
    except KeyError: 
     return 0 

# lines.sort() is more efficient than lines = lines.sorted() 
lines.sort(key=extract_time, reverse=True) 
7

可以使用dict.get()有默認值:

lines = sorted(lines, key=lambda k: k['page'].get('update_time', 0), reverse=True) 

例子:

>>> lines = [ 
...  {"page": {"url": "url1", "update_time": "1415387875"}, "other_key": {}}, 
...  {"page": {"url": "url2", "update_time": "1415381963"}, "other_key": {}}, 
...  {"page": {"url": "url3", "update_time": "1415384938"}, "other_key": {}}, 
...  {"page": {"url": "url4"}, "other_key": {}}, 
...  {"page": {"url": "url5"}, "other_key": {}} 
... ] 
>>> lines = sorted(lines, key=lambda k: k['page'].get('update_time', 0), reverse=True) 
>>> for line in lines: 
...  print line 
... 
{'other_key': {}, 'page': {'url': 'url1', 'update_time': '1415387875'}} 
{'other_key': {}, 'page': {'url': 'url3', 'update_time': '1415384938'}} 
{'other_key': {}, 'page': {'url': 'url2', 'update_time': '1415381963'}} 
{'other_key': {}, 'page': {'url': 'url4'}} 
{'other_key': {}, 'page': {'url': 'url5'}} 

雖然,我仍然會按照EAFP principle費迪南德建議 - 這樣一來,你也將處理當page密鑰也丟失的情況下。比檢查各種角落案件更容易讓它失敗並處理它。

+0

如何將json文件分配到行,以便它必須動態如果我我們100萬行然後它不會正確加載,所以這就是爲什麼 –

4
#sort json 
lines = sorted(lines, key=lambda k: k['page'].get('update_time', 0), reverse=True)