使用Python解析XML文件。多個hiearchies

嗨我想解析下面的XML文件使用python。我的「文件夾」變量設置爲始終等於標籤末尾的8位數字。在這種情況下，它是11119709.使用Python解析XML文件。多個hiearchies

的Python

for folder in folderList:

我希望能夠說，當「文件夾」等於最後8個位數的鏈接標籤，給我的EQ是什麼：秒值。我試着玩python docs元素樹提供的代碼，但是我遇到了麻煩，因爲有這麼多的層次結構。根[0] [1] .text不會檢索item標籤下的變量。感謝您的任何幫助。

XML

-<rss xmlns:georss="http://www.georss.org/georss/" xmlns:eq="http://earthquake.usgs.gov/rss/1.0/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" version="2.0"> 
    -<channel> 
     <title>USGS Earthquake ShakeMaps</title> 
     <description>List of ShakeMaps for events in the last 30 days</description> 
     <link>http://earthquake.usgs.gov/</link> 
     <dc:publisher>U.S. Geological Survey</dc:publisher> 
     <pubDate>Thu, 27 Mar 2014 15:33:05 +0000</pubDate> 
     <item> 
     <title>4.11 - 79.3 miles NNW of Kotzebue</title> 
     <description> 
     <![CDATA[<img src="http://earthquake.usgs.gov/eqcenter/shakemap/thumbs/shakemap_ak_11199709.jpg" width="100" align="left" hspace="10"/><p>Date: Thu, 27 Mar 2014 07:28:31 UTC<br/>Lat/Lon: 67.9858/-163.494<br/>Depth: 15.9122</p>]]></description> 
     <link>http://earthquake.usgs.gov/eqcenter/shakemap/ak/shake/11199709/</link> 
     <pubDate>Thu, 27 Mar 2014 07:53:33 +0000</pubDate> 
     <geo:lat>67.9858</geo:lat> 
     <geo:long>-163.494</geo:long> 
     <dc:subject>4</dc:subject> 
     <eq:seconds>1395905311</eq:seconds> 
     <eq:depth>15.9122</eq:depth> 
     <eq:region>ak</eq:region> 
     </item> 
     <item> 
       ...similar to above item

來源

2014-03-28 Andrew

使用BeautifulSoup其可以解析HTML和XML（與外部模塊），是比較容易的方式使用比在Python包括在該一個。

此代碼應該做你想要什麼：

from bs4 import BeautifulSoup 

xml = BeautifulSoup(open("filename.xml")) # here you load your XML file 
# you can also load it from an URL by using "urllib" or "Python-Requests" 

# BeautifulSoup(open("filename.xml"), "xml") # if you want to use an XML parser 
# see comments below 

for folder in folderList: 
    for item in xml.findAll("items"): # iterate through all <item> elements 
     if folder in item.link.text: # if folder's name is in the <link> element 
      print(item.find("eq:seconds").text) # print the <eq:seconds> element

來源

2014-03-28 14:32:47

謝謝！我不確定我可以在我們的服務器機器上安裝美麗的軟件，但我會研究它。 – Andrew

@Andrew如果你有權訪問一個shell（例如通過SSH），你可以使用PIP輕鬆安裝它：'pip install beautifulsoup4'，如果你沒有PIP，你可以在你的腳本目錄中提取美麗的壓縮包。 – 2014-03-28 14:47:16

BeautifulSoupt不是xml解析器，但它可以使用'lxml'來解析xml。但爲此，您需要將['「xml」參數]（http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser）傳遞給構造函數。否則，使用html解析器解析它的內容而不是xml！ – mata

如果您擔心速度，我建議lxml。它有額外的依賴性，但通常比BeautifulSoup快得多。

來源

2014-03-28 14:48:36 Midnighter

使用Python解析XML文件。多個hiearchies

回答

相關問題