2016-08-17 19 views
0

我試圖解析類似於PDF元數據:保持一個字典的初始化值,如果沒有鍵存在

fields = ["Author", "Year", "Journal", "Title", "Publisher", 
        "Page", "Address", "Annote", "Booktitle", "Chapter", 
        "Crossred", "Edition", "Editor", "HowPublished", 
        "Institution", "Month", "Note", "Number", 
        "Organization", "Pages", "School", 
        "Series", "Type", "Volume", "Doi", "File"] 
    op=pexif.get_json(filename) 
    new_op = {"Author":"Unknown"} 
    print(new_op) 
    new_op = { 
     field: str(value) for field in fields 
     for key, value in op[0].items() if field.lower() in key.lower() 
    } 
    print(new_op) 
    id_auth=new_op["Author"].split()[-1] 
    id_tit = (new_op["Title"].split()[:2]) 

在少數情況下,作者標籤是不存在的,所以我有Unknown初始化它,用希望如果沒有找到Author標籤,價值將會持續下去。 但是,在new_op ={}中,它覆蓋了舊數據。因此,對於這兩個print(new_op)產量:

{'Author': 'Unknown'} 
{'File': '/home/rudra/Downloads', 'Title': 'Formation of bcc non-equilibrium La, Gd and Dy alloys and the magnetic structure of Mg-stabilized [beta] Gd and [beta] Dy', 'Type': 'pdf', 'Page': '140'} 

,並投擲id_auth線KeyError異常:

id_auth=new_op["Author"].split()[-1] 
KeyError: 'Author' 

我想保持作者= UNKNOW如果沒有作者的關鍵存在於運算。 我該怎麼做?

僅供參考,下面是一個exiftool輸出:

ExifTool Version Number   : 10.20 
File Name      : Formation of bcc non-equilibrium La Gd and Dy alloys and the mag.pdf 
Directory      : /home/rudra/Downloads 
File Size      : 2.2 MB 
File Modification Date/Time  : 2016:07:20 15:30:48+02:00 
File Access Date/Time   : 2016:08:16 19:20:21+02:00 
File Inode Change Date/Time  : 2016:08:16 18:13:30+02:00 
File Permissions    : rw-rw-r-- 
File Type      : PDF 
File Type Extension    : pdf 
MIME Type      : application/pdf 
PDF Version      : 1.7 
Linearized      : No 
XMP Toolkit      : Adobe XMP Core 5.2-c001 63.143651, 2012/04/05-09:01:49 
Modify Date      : 2015:09:18 07:48:48-07:00 
Create Date      : 2015:09:18 07:48:48-07:00 
Metadata Date     : 2015:09:18 07:48:48-07:00 
Creator Tool     : Appligent AppendPDF Pro 5.5 
Document ID      : uuid:f06a868b-a105-11b2-0a00-782dad000000 
Instance ID      : uuid:f06aec42-a105-11b2-0a00-400080adfd7f 
Format       : application/pdf 
Title       : Formation of bcc non-equilibrium La, Gd and Dy alloys and the magnetic structure of Mg-stabilized [beta] Gd and [beta] Dy 
Producer      : Prince 9.0 rev 5 (www.princexml.com) 
Appligent      : AppendPDF Pro 5.5 Linux Kernel 2.6 64bit Oct 2 2014 Library 10.1.0 
Page Count      : 140 
Creator       : Appligent AppendPDF Pro 5.5 

回答

1

有一堆他的行動方式,但最簡單的是刪除字典的最初版本,而是事後檢查,如果作者是存在:

new_op = { 
    field: str(value) for field in fields 
    for key, value in op[0].items() if field.lower() in key.lower() 
} 
if 'Author' not in new_op: 
    new_op['Author'] = 'Unknown' 
0

您正在重新編排new_op字典。相反,下面的賦值後

new_op = { 
    field: str(value) for field in fields 
    for key, value in op[0].items() if field.lower() in key.lower() 
} 

這樣做:

if not new_op.has_key('Author'): 
    new_op['Author'] = 'Unknown' 
0
try: 
    id_auth=new_op["Author"].split()[-1] 
except KeyError: 
    id_auth="Unknown" 
相關問題