從xml標籤中檢索數據Python

我試圖使用以下代碼檢索'a：t'標籤之間的類型=「slidenum」之間的幻燈片編號，但某些內容不起作用。我應該得到1從xml標籤中檢索數據Python

這裏的XML：

<a:p><a:fld id="{55FBEE69-CA5C-45C8-BA74-481781281731}" type="slidenum"> 
<a:rPr lang="en-US" sz="1300" i="0"><a:solidFill><a:srgbClr val="000000"/> 
</a:solidFill></a:rPr><a:pPr/><a:t>1</a:t></a:fld><a:endParaRPr lang="en-US" 
sz="1300" i="0"><a:solidFill><a:srgbClr val="000000"/></a:solidFill> 
</a:endParaRPr></a:p></p:txBody></p:sp>

這裏是我的代碼

z = zipfile.ZipFile(pptx_filename) 
    for name in z.namelist(): 
     m = re.match(r'ppt/notesSlides/notesSlide\d+\.xml', name) 
    if m is not None: 
     f = z.open(name) 
     tree = ET.parse(f) 
     f.close() 
     root = tree.getroot() 
     # Find the slide number. 
     slide_num = None 
     for fld in root.findall('/'.join(['.', '', p.txBody, a.p, a.fld])): 
      if fld.get('type', '') == 'slidenum': 
       slide_num = int(fld.find(a.t).text) 
       print slide_num

來源

2015-06-30 eleanor massy

<一個：FLD ID = 「{55FBEE69-CA5C-45C8-BA74-481781281731}」類型= 「slidenum」> –

您能編輯問題以包含XML嗎？我認爲這對我們有很大的幫助:)在評論 – Jerfov2

'a：'中很難閱讀它，這意味着這些元素都在XML命名空間中。搜索這些標籤時可能需要包含名稱空間。如果你不確定如何做，你應該檢查這個答案：http://stackoverflow.com/a/14853417/849425 –

：

# cElementTree is the faster, C language based big brother of ElementTree 
from xml.etree import cElementTree as etree 

# Our test XML 
xml = ''' 
<a:p xmlns:a="http://example.com"><a:fld id="{55FBEE69-CA5C-45C8-BA74-481781281731}" type="slidenum"> 
<a:rPr lang="en-US" sz="1300" i="0"><a:solidFill><a:srgbClr val="000000"/> 
</a:solidFill></a:rPr><a:pPr/><a:t>1</a:t></a:fld><a:endParaRPr lang="en-US" 
sz="1300" i="0"><a:solidFill><a:srgbClr val="000000"/></a:solidFill> 
</a:endParaRPr></a:p> 
''' 

# Manually specify the namespace. The prefix letter ("a") is arbitrary. 
namespaces = {"a":"http://example.com"} 

# Parse the XML string 
tree = etree.fromstring(xml) 

""" 
Breaking down the search expression below 
    a:fld - Find the fld element prefixed with namespace identifier a: 
    [@type='slidenum'] - Match on an attribute type with a value of 'slidenum' 
    /a:t - Find the child element t prefixed with namespace identifier a: 
""" 
slidenums = tree.findall("a:fld[@type='slidenum']/a:t", namespaces) 
for slidenum in slidenums: 
    print(slidenum.text)

下面是使用使用提供的命名空間的外部文件相同的例子下面的OP：

from xml.etree import cElementTree as etree 

tree = etree.parse("my_xml_file.xml") 
namespaces = {"a":"http://schemas.openxmlformats.org/presentationml/2006/main"} 
slidenums = tree.findall("a:fld[@type='slidenum']/a:t", namespaces) 
for slidenum in slidenums: 
    print(slidenum.text)

來源

2015-06-30 02:47:31

嘿邁克！謝謝您的回覆！我使用的XML只是一個片段，當我使用整個文件時，代碼不起作用。 'tree = parse（file）'解析文件後如何使用你的代碼？ –

@eleanormassy我放入了一個虛假的名稱空間URL，因爲從給出真實名稱空間URL的XML示例中不明顯。您可能需要將該URL更改爲XML文件中的URL。（你會看到它被定義爲一個屬性'xmlns：a =「」' –

是的，我得到了那部分，我改變了網址到我的文件中！我怎麼在'tree = parse（文件）'？謝謝 –

分析之前，我會刪除您的XML命名空間的標籤。然後使用XPATH fld[@type='slidenum']/t找到類型爲fld的所有節點，其中fld[@type='slidenum']/t和子節點t。這裏有一個例子來說明這是如何工作的：從Moxymoo的回答以下使用的命名空間，而不是刪除它們的改性

from lxml import etree 

xml = """ 
<a:p><a:fld id="{55FBEE69-CA5C-45C8-BA74-481781281731}" type="slidenum"> 
<a:rPr lang="en-US" sz="1300" i="0"><a:solidFill><a:srgbClr val="000000"/> 
</a:solidFill></a:rPr><a:pPr/><a:t>1</a:t></a:fld><a:endParaRPr lang="en-US" 
sz="1300" i="0"><a:solidFill><a:srgbClr val="000000"/></a:solidFill> 
</a:endParaRPr></a:p> 
""" 

tree = etree.fromstring(xml.replace('a:','')) 
slidenum = tree.find("fld[@type='slidenum']/t").text 
print(slidenum) 
1

來源

2015-06-30 02:30:02 maxymoo

通常定義XML名稱空間以消除元素名稱中的歧義。根據文檔的結構，刪除它們可能會產生意想不到的後果。我假定OP所顯示的XML是大文檔的一部分 - 部分原因是它的格式不正確（這對我來說意味着它被錯誤地複製和粘貼），另外也因爲它看起來是XML中的PowerPoint幻燈片套件格式。（Microsoft Office的XML格式非常詳細。） –

從xml標籤中檢索數據Python

回答

相關問題