2011-03-16 98 views
4

我嘗試寫一些單元測試在Python 2.7來驗證一些擴展我的OAI-PMH架構進行:http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd如何使用Python中的多個名稱空間驗證XML?

,我快到的問題是企業有多個嵌套的命名空間是在上面提到的XSD由此引起的規格:

<complexType name="metadataType"> 
    <annotation> 
     <documentation>Metadata must be expressed in XML that complies 
     with another XML Schema (namespace=#other). Metadata must be 
     explicitly qualified in the response.</documentation> 
    </annotation> 
    <sequence> 
     <any namespace="##other" processContents="strict"/> 
    </sequence> 
</complexType> 

下面是我使用的代碼片段:我結束了以下錯誤

import lxml.etree, urllib2 

query = "http://localhost:8080/OAI-PMH?verb=GetRecord&by_doc_ID=false&metadataPrefix=nsdl_dc&identifier=http://www.purplemath.com/modules/ratio.htm" 
schema_file = file("../schemas/OAI/2.0/OAI-PMH.xsd", "r") 
schema_doc = etree.parse(schema_file) 
oaischema = etree.XMLSchema(schema_doc) 

request = urllib2.Request(query, headers=xml_headers) 
response = urllib2.urlopen(request) 
body = response.read() 
response_doc = etree.fromstring(body) 

try: 
    oaischema.assertValid(response_doc) 
except etree.DocumentInvalid as e: 
    line = 1; 
    for i in body.split("\n"): 
     print "{0}\t{1}".format(line, i) 
     line += 1 
    print(e.message) 

AssertionError: http://localhost:8080/OAI-PMH?verb=GetRecord&by_doc_ID=false&metadataPrefix=nsdl_dc&identifier=http://www.purplemath.com/modules/ratio.htm 
Element '{http://www.openarchives.org/OAI/2.0/oai_dc/}oai_dc': No matching global element declaration available, but demanded by the strict wildcard., line 22 

我明白錯誤,因爲模式要求嚴格驗證元數據元素的子元素,這是xml示例的作用。

現在我已經用Java編寫了一個驗證器,它可以工作 - 但是這會對Python有幫助,因爲我構建的其他解決方案是基於Python的。爲了使我的Java變體能夠工作,我不得不使我的DocumentFactory命名空間感知到,否則我得到了同樣的錯誤。我還沒有在python中找到任何正確執行此驗證的工作示例。

有沒有人有一個想法,我可以如何使用多個嵌套命名空間獲取XML文檔,因爲我的示例doc使用Python進行了驗證?

這裏是我試圖驗證示例XML文檔:

<?xml version="1.0" encoding="UTF-8"?> 
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ 
    http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> 
    <responseDate>2002-02-08T08:55:46Z</responseDate> 
    <request verb="GetRecord" identifier="oai:arXiv.org:cs/0112017" 
     metadataPrefix="oai_dc">http://arXiv.org/oai2</request> 
    <GetRecord> 
    <record> 
    <header> 
     <identifier>oai:arXiv.org:cs/0112017</identifier> 
     <datestamp>2001-12-14</datestamp> 
     <setSpec>cs</setSpec> 
     <setSpec>math</setSpec> 
    </header> 
    <metadata> 
     <oai_dc:dc 
    xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" 
    xmlns:dc="http://purl.org/dc/elements/1.1/" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ 
    http://www.openarchives.org/OAI/2.0/oai_dc.xsd"> 
    <dc:title>Using Structural Metadata to Localize Experience of 
      Digital Content</dc:title> 
    <dc:creator>Dushay, Naomi</dc:creator> 
    <dc:subject>Digital Libraries</dc:subject> 
    <dc:description>With the increasing technical sophistication of 
     both information consumers and providers, there is 
     increasing demand for more meaningful experiences of digital 
     information. We present a framework that separates digital 
     object experience, or rendering, from digital object storage 
     and manipulation, so the rendering can be tailored to 
     particular communities of users. 
    </dc:description> 
    <dc:description>Comment: 23 pages including 2 appendices, 
     8 figures</dc:description> 
    <dc:date>2001-12-14</dc:date> 
     </oai_dc:dc> 
    </metadata> 
    </record> 
</GetRecord> 
</OAI-PMH> 
+0

盡我所能在這一點上說,似乎是在libxml2的一個錯誤,這是防止驗證的嵌套命名驗證。 – Jim 2011-03-28 23:50:34

回答

0

lxml's doc on validation發現這一點:

>>> schema_root = etree.XML('''\ 
... <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> 
...  <xsd:element name="a" type="xsd:integer"/> 
... </xsd:schema> 
... ''') 
>>> schema = etree.XMLSchema(schema_root) 

>>> parser = etree.XMLParser(schema = schema) 
>>> root = etree.fromstring("<a>5</a>", parser) 

所以,也許,你需要的是什麼? (見最後兩行):

schema_doc = etree.parse(schema_file) 
oaischema = etree.XMLSchema(schema_doc) 

request = urllib2.Request(query, headers=xml_headers) 
response = urllib2.urlopen(request) 
body = response.read() 
parser = etree.XMLParser(schema = oaischema) 
response_doc = etree.fromstring(body, parser) 
相關問題