2011-12-23 71 views
2

下面的代碼工作正常,但沒有任何pythonic方式來獲得相同的功能? 我只想解析XML並從幾個元素(name,name_status,url)獲取文本。從XML的etree的單個元素獲取文本

from lxml import etree 
from urllib2 import urlopen 

def ask_CoL(url): 
    tree = etree.parse(urlopen(url)) 
    tn=[ el.get('total_number_of_results') for el in tree.iter('results') ] 
    try: 
     nr = int(tn[0]) 
    except ValueError: 
     nr = 0 
    if nr == 1: 
     newstr = str([ el.text for el in tree.getiterator(tag='name')])\ 
              .strip("[]'")+','\ 
       +str([ el.text for el in tree.getiterator(tag='name_status')])\ 
              .strip("[]'")+','\ 
       +str([ el.text for el in tree.getiterator(tag='url')])\ 
              .strip("[]'")+'\n' 
    else: 
     newstr = 'NA\n' 
    return newstr 

例如XML:

<results id="" name="Theragra chalcogramma" total_number_of_results="1" number_of_results_returned="1" start="0" error_message="" version="1.6 rev 1152"> 
    <result> 
    <id>9037795</id> 
    <name>Theragra chalcogramma</name> 
    <rank>Species</rank> 
    <name_status>accepted name</name_status> 
    <online_resource>http://www.fishbase.org/Summary/SpeciesSummary.php?ID=318</online_resource> 
    <source_database>FishBase</source_database> 
    <source_database_url>http://www.fishbase.org</source_database_url> 
    <name_html><i>Theragra chalcogramma</i> (Pallas, 1814)</name_html> 
    <url>http://www.catalogueoflife.org/col/details/species/id/9037795</url> 
    </result> 
</results> 
+2

你有一些示例XML?它會幫助任何決定在發佈前測試答案的人。 – FakeRainBrigand 2011-12-23 05:37:54

+0

也許發佈你期望的輸出也看起來像。 – 2011-12-23 08:34:26

回答

1

你可以同時簡化了接口和實現:

import urllib2 
from xml.etree import cElementTree as etree 

def f(url): 
    tree = etree.parse(urllib2.urlopen(url))   
    el = tree.find('results') 
    if el is not None: 
     lst = [el.findtext(tag) or '' for tag in "name name_status url".split()] 
     return ','.join(lst) 
+0

謝謝!在el = tree.find('results')中,baseUrl =「http://www.catalogueoflife.org/col/webservice?name=」 – 2011-12-23 21:46:40

+0

替換'results'爲'result',XML源代碼可用。 – 2011-12-23 23:11:52