如何使用ElementTree的

我試圖解析XML字符串，這是我從YouTube視頻輸入，使用Python 3.3.1得到發現XML的特定元素。下面是代碼：如何使用ElementTree的

import re 
import sys 
import urllib.request 
import urllib.parse 
import xml.etree.ElementTree as element_tree 

def get_video_id(video_url): 
    return re.search(r'watch\?v=.*', video_url).group(0)[8:] 

def get_video_feed(video_url): 
    video_feed = "http://gdata.youtube.com/feeds/api/videos/" + get_video_id(video_url) 
    return urllib.request.urlopen(video_feed).read() 

def get_media_info(video_url): 
    content = get_video_feed(video_url) 
    content = str(content, 'ascii') 
    media = {} 
    e = element_tree.XML(content); 

    print ("CONTENT: \n" + content) 

    print ("\n\nELEMENTS : \n") 
    for i in list(e): 
     print (i) 

    media['title'] = e.findall('title') //NOTE THIS! 
    return media 


def main(): 
    video_url = 'http://youtube.com/watch?v=q5sOLzEerwA' 

    print (get_media_info(video_url)) 

if __name__ == '__main__': 
    main()

我不知道爲什麼for迴路get_media_info()打印元素

<Element '{http://www.w3.org/2005/Atom}title' at 0x0000000002BF7D18>

，而不是這樣的：

<Element 'title' at 0x0000000002BF7D18>

坦率地說，我不知道關心它打印什麼。所有我關心的是，我想通過'title'到findall()和期望元素（一個或多個）的返回值的列表。但它會返回空列表，即使在xml中有一個名稱爲title的元素。

所以，我想這一點：

media['title'] = e.findall('{http://www.w3.org/2005/Atom}title')

而且它沒有返回一個元素的列表。我相信這不是這樣做的方式，我覺得我錯過了一些東西。

如何解決這一問題？

這是上面代碼的輸出：

內容：

<?xml version='1.0' encoding='UTF-8'?> 
<entry xmlns='http://www.w3.org/2005/Atom' xmlns:media='http://search.yahoo.com/mrss/' xmlns:gd='http://schemas.google.com/g/2005' xmlns:yt='http://gdata.youtube.com/schemas/2007'> 
    <id>http://gdata.youtube.com/feeds/api/videos/q5sOLzEerwA</id> 
    <published>2011-12-01T18:18:36.000Z</published> 
    <updated>2013-05-07T03:20:04.000Z</updated> 
    <category scheme='http://schemas.google.com/g/2005#kind' term='http://gdata.youtube.com/schemas/2007#video'/> 
    <category scheme='http://gdata.youtube.com/schemas/2007/categories.cat' term='Music' label='Music'/> 
    <title type='text'>Kala Bazaar - Khoya Khoya Chand Khula Aasman - Mohd Rafi.flv</title> 
    <content type='text'>tanhayi me akele me khoya khoya chand.........</content> 
    <link rel='alternate' type='text/html' href='http://www.youtube.com/watch?v=q5sOLzEerwA&amp;feature=youtube_gdata'/> 
    <link rel='http://gdata.youtube.com/schemas/2007#video.responses' type='application/atom+xml' href='http://gdata.youtube.com/feeds/api/videos/q5sOLzEerwA/responses'/> 
    <link rel='http://gdata.youtube.com/schemas/2007#video.related' type='application/atom+xml' href='http://gdata.youtube.com/feeds/api/videos/q5sOLzEerwA/related'/> 
    <link rel='http://gdata.youtube.com/schemas/2007#mobile' type='text/html' href='http://m.youtube.com/details?v=q5sOLzEerwA'/> 
    <link rel='self' type='application/atom+xml' href='http://gdata.youtube.com/feeds/api/videos/q5sOLzEerwA'/> 
    <author> 
    <name>a1a2a3a4a786</name> 
    <uri>http://gdata.youtube.com/feeds/api/users/a1a2a3a4a786</uri> 
    </author> 
    <gd:comments> 
    <gd:feedLink rel='http://gdata.youtube.com/schemas/2007#comments' href='http://gdata.youtube.com/feeds/api/videos/q5sOLzEerwA/comments' countHint='6'/> 
    </gd:comments> 
    <media:group> 
    <media:category label='Music' scheme='http://gdata.youtube.com/schemas/2007/categories.cat'>Music</media:category> 
    <media:content url='http://www.youtube.com/v/q5sOLzEerwA?version=3&amp;f=videos&amp;app=youtube_gdata' type='application/x-shockwave-flash' medium='video' isDefault='true' expression='full' duration='293' yt:format='5'/> 
    <media:content url='rtsp://v6.cache3.c.youtube.com/CiILENy73wIaGQkArx4xLw6bqxMYDSANFEgGUgZ2aWRlb3MM/0/0/0/video.3gp' type='video/3gpp' medium='video' expression='full' duration='293' yt:format='1'/> 
    <media:content url='rtsp://v6.cache3.c.youtube.com/CiILENy73wIaGQkArx4xLw6bqxMYESARFEgGUgZ2aWRlb3MM/0/0/0/video.3gp' type='video/3gpp' medium='video' expression='full' duration='293' yt:format='6'/> 
    <media:description type='plain'>tanhayi me akele me khoya khoya chand.........</media:description> 
    <media:keywords/> 
    <media:player url='http://www.youtube.com/watch?v=q5sOLzEerwA&amp;feature=youtube_gdata_player'/> 
    <media:thumbnail url='http://i.ytimg.com/vi/q5sOLzEerwA/0.jpg' height='360' width='480' time='00:02:26.500'/> 
    <media:thumbnail url='http://i.ytimg.com/vi/q5sOLzEerwA/1.jpg' height='90' width='120' time='00:01:13.250'/> 
    <media:thumbnail url='http://i.ytimg.com/vi/q5sOLzEerwA/2.jpg' height='90' width='120' time='00:02:26.500'/> 
    <media:thumbnail url='http://i.ytimg.com/vi/q5sOLzEerwA/3.jpg' height='90' width='120' time='00:03:39.750'/> 
    <media:title type='plain'>Kala Bazaar - Khoya Khoya Chand Khula Aasman - Mohd Rafi.flv</media:title> 
    <yt:duration seconds='293'/> 
    </media:group> 
    <gd:rating average='4.733333' max='5' min='1' numRaters='30' rel='http://schemas.google.com/g/2005#overall'/> 
    <yt:statistics favoriteCount='0' viewCount='8140'/> 
</entry>

元素：

<Element '{http://www.w3.org/2005/Atom}id' at 0x0000000002BF79F8> 
<Element '{http://www.w3.org/2005/Atom}published' at 0x0000000002BF7B88> 
<Element '{http://www.w3.org/2005/Atom}updated' at 0x0000000002BF7A48> 
<Element '{http://www.w3.org/2005/Atom}category' at 0x0000000002BF7C78> 
<Element '{http://www.w3.org/2005/Atom}category' at 0x0000000002BF7CC8> 
<Element '{http://www.w3.org/2005/Atom}title' at 0x0000000002BF7D18> 
<Element '{http://www.w3.org/2005/Atom}content' at 0x0000000002BF7D68> 
<Element '{http://www.w3.org/2005/Atom}link' at 0x0000000002BF7DB8> 
<Element '{http://www.w3.org/2005/Atom}link' at 0x0000000002BF7E08> 
<Element '{http://www.w3.org/2005/Atom}link' at 0x0000000002BF7E58> 
<Element '{http://www.w3.org/2005/Atom}link' at 0x0000000002BF7EA8> 
<Element '{http://www.w3.org/2005/Atom}link' at 0x0000000002BF7EF8> 
<Element '{http://www.w3.org/2005/Atom}author' at 0x0000000002BF7F48> 
<Element '{http://schemas.google.com/g/2005}comments' at 0x0000000002C0B0E8> 
<Element '{http://search.yahoo.com/mrss/}group' at 0x0000000002C0B1D8> 
<Element '{http://schemas.google.com/g/2005}rating' at 0x0000000002C0B778> 
<Element '{http://gdata.youtube.com/schemas/2007}statistics' at 0x0000000002C0B7C8> 
{'title': []}

來源

2013-05-07 Nawaz

其實我已經嘗試使用xml.dom.minidom下面的方式，就在無論如何它可以幫助你。

#!/usr/bin/python 

from xml.dom.minidom import parseString 
import re 
import urllib 

def get_video_id(video_url): 
    return re.search(r'watch\?v=.*', video_url).group(0)[8:] 

def get_video_feed(video_url): 
    video_feed = "http://gdata.youtube.com/feeds/api/videos/" + get_video_id(video_url) 
    print video_feed 
    return urllib.urlopen(video_feed).read() 

def get_media_info(video_url): 
    content = get_video_feed(video_url) 
    dom = parseString(content) 
    media = {} 

    media['title'] = dom.getElementsByTagName('title')[0].firstChild.nodeValue 
    return media 

def main(): 
    video_url = 'http://youtube.com/watch?v=q5sOLzEerwA' 

    print (get_media_info(video_url)) 

if __name__ == '__main__': 
    main()

來源

2013-05-07 15:10:44 gsmaker

XML文檔的名稱空間很重要。 ElementTree要求標籤完全合格以找到正確的元素。下面是在不同的命名空間相同的標記三個元素的一個例子：

data = '''\ 
<root xmlns="xyz" xmlns:name="abc"> 
    <object name="one" /> 
    <name:object name="two" /> 
    <object xmlns="def" name="three" /> 
</root> 
'''

下面是ElementTree中看到的元素：

>>> from xml.etree import ElementTree as et 
>>> tree = et.fromstring(data) 
>>> print(tree.findall('.//*')) 
>>> et.dump(tree) 
[<Element '{xyz}object' at 0x0000000003B07BD8>, 
<Element '{abc}object' at 0x0000000003B07C28>, 
<Element '{def}object' at 0x0000000003B07C78>]

所以，你必須是正確的。由於默認的命名空間定義：

<entry xmlns='http://www.w3.org/2005/Atom' ...

要訪問「標題」標籤，它使用默認的命名空間：

media['title'] = e.findall('{http://www.w3.org/2005/Atom}title')

訪問「媒體：組」標籤，請參閱媒體命名空間定義：

<entry ... xmlns:media='http://search.yahoo.com/mrss/' ...

及用途：

e.findall('{http://search.yahoo.com/mrss/}group')

種

注意不同的方式命名空間可以指定：

<root xmlns="xyz" xmlns:name="abc"> # default namespace and 
             # 'abc' namespace with id 'name'. 
    <object name="one" />    # Uses default namespace 'xyz'. 
    <name:object name="two" />   # uses 'abc' namespace (specified by id). 
    <object xmlns="def" name="three" /> # change the default namespace to 'def'. 
</root>

從一個特定的命名空間讀取特定標籤：

>>> print(tree.find('{abc}object').attrib['name']) 
'two'

注意命名空間的ID只是快捷方式。以下是轉儲已解析的XML樹時發生的情況。 ElementTree的不打擾到保存原始的命名空間ID和生成自己的格式ns#：

>>> et.dump(tree) 
<ns0:root xmlns:ns0="xyz" xmlns:ns1="abc" xmlns:ns2="def"> 
    <ns0:object name="one" /> 
    <ns1:object name="two" /> 
    <ns2:object name="three" /> 
</ns0:root>

如果你想定義特定的快捷方式，使用`register_namespace'：

>>> et.register_namespace('','xyz') # default namespace 
>>> et.register_namespace('name','abc') 
>>> et.register_namespace('custom','def') 
>>> et.dump(tree) 
<root xmlns="xyz" xmlns:custom="def" xmlns:name="abc"> 
    <object name="one" /> 
    <name:object name="two" /> 
    <custom:object name="three" /> 
</root>

來源

2013-05-08 05:03:02

我試過'element_tree。 register_namespace（''，'http://www.w3.org/2005/Atom'）'然後'media ['title'] = e.findall（'title'）'。它仍然返回空列表。我們能否以某種方式使其工作而不將'namespace'傳遞給'findall（）'？ – Nawaz 2013-05-08 05:57:31

不，「register_namespace」僅用於輸出。如果您使用'ElementTree'，則必須傳遞完全限定的名稱。只需編寫一個輔助函數來加以解決。你可以看看第三方'lxml'。它有一個類似ElementTree的界面，可能有你想要的功能。我不太熟悉它。 – 2013-05-08 13:38:33

如何使用ElementTree的

回答

相關問題