2013-12-11 87 views
1

我想獲得一個元素屬性,但我得到的是一個None值或空列表,具體取決於我試圖獲取它。此外,如果有人知道更好的方式來獲得元素的特定標籤,我將不勝感激。 這裏是代碼和間隔出來的部分是什麼應該返回的網址,但沒有。使用ElementTree獲取檢索XML屬性Python

import xml.etree.ElementTree as ET 
import webbrowser,time,urllib.request 
import tkinter as tk 
import urllib 

# webbrowser.get('windows-default').open_new('http://www.reddit.com/'+'r/blender') 
main = tk.Tk() 
class Application(tk.Frame): 



    def __init__(self, master=None): 
     tk.Frame.__init__(self, master) 
     self.pack() 
     self.createWidgets() 
     self.initial() 

    def createWidgets(self): 
     # print('Went to createWidgets()') 
     self.send_entry = tk.Entry(self) 
     self.send_entry.grid(row=0,column=0) 
     self.change_sub = tk.Button(self,text='Change Subreddit', command=lambda :self.getXML(self.send_entry.get())).grid(row=0 , column=2) 
     self.lb_scrollY = tk.Scrollbar(self,orient=tk.VERTICAL) 
     self.lb_scrollY.grid(row=1,column=1,sticky=tk.NS) 
     self.thread_lb = tk.Listbox(self,yscrollcommand=self.lb_scrollY.set) 
     self.lb_scrollY['command']=self.thread_lb.yview 
     self.thread_lb.grid(row=1,column=0) 
     self.QUIT = tk.Button(self, text="QUIT", fg="red", command=main.destroy).grid(row=2) 




    def descStripper(self,desc): 
     x1=int(desc.find('alt="')) 
     if x1 != -1: 
      x2Start = x1+5 
      x2=int(desc.find('"',x2Start)) 
      desc = desc[x1+5:x2] 
      return desc 
     else: 
      desc = "There is no description. Maybe it's a link" 
      return desc 

    def lbPopulator(self,title,pub,link): 
     # print('Went to lbPopulator()') 
     self.thread_lb.delete(0,tk.END) 
     for item in title: 
      self.thread_lb.insert(tk.END,item) 

    def getXmlData(self): 
     counter = 0 
     self.threadPubDateList = [] 
     self.threadTitleList = [] 
     self.threadLinkList = [] 
     self.threadDescList = [] 
     self.threadThumbNail = [] 
     tree=ET.parse('rss.xml') 
     root=tree.getroot() 
     for channel in root: 
      for SubChannel in channel: 
       if SubChannel.tag == 'item': 
        for threadInfo in SubChannel: 
         # print(SubChannel.getchildren()) 
         if threadInfo.tag == 'title': 
          self.threadTitleList.append(threadInfo.text) 
         if threadInfo.tag == 'pubDate': 
          self.threadPubDateList.append(threadInfo.text[:-6]) 
         if threadInfo.tag == 'link': 
          self.threadLinkList.append(threadInfo.text) 
         if threadInfo.tag == 'description': 
          self.threadDescList.append(self.descStripper(threadInfo.text)) 









         if threadInfo.tag == '{http://search.yahoo.com/mrss/}title': 
          print(threadInfo.tag) 
          print(threadInfo.attrib) 
          print(threadInfo.get('url')) 











     self.lbPopulator(self.threadTitleList,self.threadPubDateList,self.threadLinkList) 
     # print(self.threadTitleList) 
     # print(self.threadPubDateList) 
     # print(self.threadLinkList) 
     # print(self.threadDescList) 
    def getXML(self,subreddit): 
     try: 
      url = 'http://www.reddit.com'+subreddit+'.rss' 
      source = urllib.request.urlretrieve(url,'rss.xml') 
      self.getXmlData() 
     except urllib.error.HTTPError as err: 
      print('Too many requests-Try again') 
    def initial(self): 
     try: 
      source = urllib.request.urlretrieve('http://www.reddit.com/.rss','rss.xml') 
      self.getXmlData() 
     except urllib.error.HTTPError as err: 
      print('Too many requests-Trying again 3') 
      time.sleep(3) 
      self.__init__() 


# main.geometry("250x150") 

app = Application(master=main) 
app.mainloop() 

這裏是一個當傳遞一個XML文件,它應該返回縮略圖的URL的代碼位。這是所有最後的'如果'的聲明和其他所有工作正常。

​​3210
+0

你應該嘗試張貼的體現你的問題你的代碼的特定位,或創建一個說明性例子。查看示例輸入,預期輸出和實際輸出也很有幫助。 – msnider

+0

這樣比較好。它使您更容易看到您嘗試了什麼,出了什麼問題。看到你正在解析的XML元素的例子也是很好的...在最後的if塊中,3個'print'語句的輸出是什麼? – msnider

+0

如果您只是運行該程序,它將下載XML。你可以在[reddit.com/.rss]找到它(http://reddit.com/.rss) 輸出是 '{http://search.yahoo.com/mrss/} title'' '{}' '無' – ddaniels

回答

1

有一個屬性,叫做網址媒體的唯一標籤:縮略圖標籤。正如你所指出的,mediaxmlns:media="http://search.yahoo.com/mrss/"的頂部。這使我相信你在過去的if語句應該是:

if threadInfo.tag == '{http://search.yahoo.com/mrss/}thumbnail': 
    print(threadInfo.tag) 
    print(threadInfo.attrib) 
    print(threadInfo.get('url')) 

這應該產生的輸出:

'{http://search.yahoo.com/mrss/}thumbnail' 
{'url' : 'http://a.thumbs.redditmedia.com/cozEqqG9muj-tT3Z.jpg'} 
'http://a.thumbs.redditmedia.com/cozEqqG9muj-tT3Z.jpg' 
+1

在xml媒體的開頭聲明爲{http://search.yahoo.com/mrss/}。如果您將其更改爲「媒體:縮略圖」,則它永遠不會找到該標籤。 – ddaniels

+0

好點,我想我看到你的問題。試試我更新的答案。 – msnider

+0

是的工作。沒有記住將其更改回縮略圖。謝謝! – ddaniels