2013-04-16 74 views
0

我試圖解析一個網絡博客頁面,並將某些數據拉出到列表中。這裏是XML ..用XML解析

http://www-01.ibm.com/software/support/lifecycle/rss/PLCWeeklyXMLDownload.xml

有多個記錄,但每次我需要拉出軟件名稱,版本號,版本號,ModLevelNumber和最終服務日期(如果有的話),並把它們到一個列表

我運行Python代碼,但進出口新的XML,任何幫助表示讚賞

def myDownload(): 
    import xml.etree.ElementTree as et 
    import urllib.request 
    response = urllib.request.urlopen("http://www-01.ibm.com/software/support/lifecycle/rss/PLCWeeklyXMLDownload.xml") 
    tree = et.parse(response) 
    root = tree.getroot() 
    aList=[] 

    for child in root: 
     for node in child.findall("SWTitle"): 
     title = node.text 
     aList.append(title) 
     for nodes in child.findall("Versions"): 
     for version in nodes.findall("Version"): 
      for release in version.findall("Release_Mods"): 
      for mod in release.findall("Release_Mod"): 
       rNum = mod.find("releaseNumber") 
       rNumber = rNum.text 
       nNum = mod.find("modLevelNumber") 
       nNumber=nNum.text 
       aList.append(rNumber) 
       aList.append(nNumer) 

誰能幫助調整此代碼,因爲它似乎沒有工作

+1

你有什麼問題? – Blender

+1

尋找python的xml庫。然後,如果你知道節點在xml樹中的位置,那麼你可以告訴它看起來那裏。 – Patashu

+0

@Blender你可以檢查我的代碼 – BAI

回答

0

可以使用lxml庫這樣的:

import requests 
from lxml import etree 

r = requests.get('http://www-01.ibm.com/software/support/lifecycle/rss/PLCWeeklyXMLDownload.xml') 
xml = r.content 
xml_dom = etree.fromstring(xml) 

# Iterate over <SWTitleRecord> 
for record_node in xml_dom: 
    data = {} 
    for attr_node in record_node: 
     if attr_node.tag == 'SWTitle' 
      data['title'] = attr_node.text 
     elif attr_node.tag == 'Versions': 
      # parse versions 
    ...  
+0

你能檢查我的代碼嗎 – BAI

1

使用lxml的庫來解析XML。 ElementTree不適用於更多嵌套標籤。