2012-01-23 33 views
1

這裏是我的代碼一件奇怪的事情:LXML解析HTML:錯誤的結果,爲什麼

import lxml.html 
myxml=''' 
<cooperate> 
    <job DecreaseHour="1" table="tpa_radio_sum"> 
    </job> 

    <job DecreaseHour="2" table="tpa_radio_sum">         
    </job> 


    <job DecreaseHour="3" table="tpa_radio_sum"> 
    </job> 
</cooperate> 
''' 
root=lxml.html.fromstring(myxml) 
nodes1=root.xpath('//job[@DecreaseHour="1"]') 
nodes2=root.xpath('//job[@table="tpa_radio_sum"]')  
print "nodes1=",nodes1 
print "nodes2=",nodes2 

我得到的是:
nodes1=[]

nodes2=[ Element job at 0x1241240,  
Element job at 0x1362690,  
Element job at 0x13626c0] 

爲什麼nodes1[]?這是一件很奇怪的事情。爲什麼?

回答

5

由於您使用的HTML解析器的所有屬性變成小寫:

>>> root.xpath("//job")[0].attrib 
{'table': 'tpa_radio_sum', 'decreasehour': '1'} 

您可以使用真正的XML解析器:

>>> import lxml.etree 
>>> root = lxml.etree.fromstring(myxml) 
>>> root.xpath('job[@DecreaseHour="1"]') 
[<Element job at 0x293daa8>]