2016-09-06 46 views
1

API響應:http://iss.ndl.go.jp/api/opensearch?isbn=9784334770051 您好,感謝您的幫助昨天。 但是,當我嘗試從元素中獲取值時,我始終得到空值作爲響應。 我被推薦了link但是不知道我明白了。 我錯在哪裏,有空值?如何解析XML元素並獲得Python 2.7的值

#!/usr/bin/env python 
    # -*- coding: utf-8 -*- 
    import codecs 
    import sys 
    import urllib 
    import urllib2 
    import re, pprint 
    from xml.etree.ElementTree import * 
    import csv 
    from xml.dom import minidom 
    import xml.etree.ElementTree as ET 
    import shelve 
    import subprocess 

    errorCheck = "0" 
    isbn = raw_input("Enter IBSN Number Please ") 
    isIsbn = len(isbn) 

    # ElementTree requires namespace definition to work with XML with namespaces correctly 
    # It is hardcoded at this point, but this should be constructed from response. 
    namespaces = { 
     'dc': 'http://purl.org/dc/elements/1.1/', 
     'dcndl': 'http://ndl.go.jp/dcndl/terms/', 
    } 

    # for prefix, uri in namespaces.iteritems(): 
     # ElementTree.register_namespace(prefix, uri) 

    if isIsbn == 10 or isIsbn == 13: 
     errorCheck = 1 
     url = "http://iss.ndl.go.jp/api/opensearch?isbn=%s" % isbn 
     req = urllib2.Request(url) 
     response = urllib2.urlopen(req) 
     tree = ET.parse(response) 
     root = tree.getroot() 
     # root = ET.fromstring(XmlData) 
     print root.findall('dc:title', namespaces) 
     print root.findall('dc:title') 
     print root.findall('dc:identifier', namespaces) 
     print root.findall('dc:identifier') 
     print root.findall('identifier') 

    if errorCheck == "0": 
     print "It is not ISBN" 

     # print(root.tag,root.attrib)  

     # for child in root.find('.//item'): 
     # print child.text 

回答

0

你的代碼需要稍微修改,添加.//的findall打電話給你的表情,根節點是RSS節點和DC:標題的是後裔不直接孩子RSS節點,所以你需要通過文檔搜索:

import xml.etree.ElementTree as ET 
import requests 

url = "http://iss.ndl.go.jp/api/opensearch?isbn=9784334770051" 
tree = ET.fromstring(requests.get(url).content) 
namespaces = { 
    'dc': 'http://purl.org/dc/elements/1.1/', 
    'dcndl': 'http://ndl.go.jp/dcndl/terms/', 
} 
[t.text for t in tree.findall('.//dc:title', namespaces)] 
[i.text for i in tree.findall('.//dc:identifier', namespaces)] 

你可以做到這一點很容易使用LXML該命名空間映射爲你可以得到源:

In [1]: import lxml.etree as et 

In [2]: url = "http://iss.ndl.go.jp/api/opensearch?isbn=9784334770051" 

In [3]: tree = et.parse(url) 

In [4]: nsmap = tree.getroot().nsmap 

In [5]: print(tree.xpath("//dc:title/text()", namespaces=nsmap)) 
[u'\u9244\u8155\u30a2\u30c8\u30e0'] 

In [6]: print(tree.xpath("//dc:identifier/text()", namespaces=nsmap)) 
['4334770053', '95078560'] 

你可以看到到DC的一個路徑:標題:

In [55]: tree 
Out[55]: <Element 'rss' at 0x7f996e8b66d0> # root 

In [56]: tree.findall('channel') # child of root so don't need .// 
Out[56]: [<Element 'channel' at 0x7f996e131990>] 

In [57]: tree.findall('channel/item/dc:title', namespaces) # item is a descendant of rss, item is parent of the dc:title 
Out[57]: [<Element '{http://purl.org/dc/elements/1.1/}title' at 0x7f996e131910>] 

同樣的,標識符:

In [58]: tree.findall('channel//item//dc:identifier', namespaces) 
Out[58]: 
[<Element '{http://purl.org/dc/elements/1.1/}identifier' at 0x7f996e131c50>, 
<Element '{http://purl.org/dc/elements/1.1/}identifier' at 0x7f996e131250>] 
+1

謝謝真的幫助我。 –