0
試圖尋找一個簡單的解釋,我要去哪裏錯了,但無法找到一個。下面的代碼摘錄:ElementTree XML解析只是返回sitemap.org?
import time, threading, urllib2, os
import xml.etree.ElementTree as ET
save_path = '/Users/sampeka/Desktop/Programming/SilkySpider/Data'
bloomberg_site_map = urllib2.urlopen('http://www.bloomberg.com/sitemap_news.xml').read()
reuters_site_map = urllib2.urlopen('http://www.reuters.com/sitemap_news_index.xml').read()
def saveXmlFile(data,name):
try:
abs_path = os.path.abspath(save_path)
open_file = open(abs_path+'/'+name,'w')
open_file.write(data)
finally:
open_file.close()
class ParseXML:
def __init__(self,xml_file):
self.xml_file = xml_file
def printStuff(self):
tree = ET.parse(self.xml_file)
root = tree.getroot()
for child in root:
print child.tag, child.attrib
saveXmlFile(bloomberg_site_map,'Bloomberg Site Map.xml')
ParseXML(save_path+'/Bloomberg Site Map.xml').printStuff()
回報這幾次:
{http://www.sitemaps.org/schemas/sitemap/0.9}url
{http://www.sitemaps.org/schemas/sitemap/0.9}url
{http://www.sitemaps.org/schemas/sitemap/0.9}url
{http://www.sitemaps.org/schemas/sitemap/0.9}url
{http://www.sitemaps.org/schemas/sitemap/0.9}url
的XML被正確保存,所以我必須只是簡單的東西。有人可以解釋爲什麼這會被卡住嗎?非常感謝您的幫助。
這就提出了AttributeError的: '元素' 對象沒有屬性 '的xpath' –
@samp:抱歉,我用'lxml.etree'而非'lxml.etree .ElementTree'。相應地更新了答案。 – isedev