XPath選擇不返回匹配

source1 = ' <tr> <td bgcolor="#ffffff">Gemara</td> <td bgcolor="#ffffff">Kiddushin</td> <td bgcolor="#ffffff">Morning</td> <td bgcolor="#ffffff">12-04-2104</td> <td colspan=2 bgcolor="#ffffff" nowrap="nowrap"> <a href="#" onClick="listen('05-115-08-2104-12-04.mp3')"><img src="images/play_audio.gif" border="0"></a> <a href="#" onClick="mydownload('05-115-08-2104-12-04.mp3')"><img src="images/download.gif" border="0"></a> </td>  </tr> '

from lxml import html source2 = html.fromstring(str(source1)) Category = source2.xpath('//tr[1]//td[@bgcolor="#ffffff"][1]//text()') Book = source2.xpath('//tr[1]//td[@bgcolor="#ffffff"][2]//text()') Section = source2.xpath('//tr[1]//td[@bgcolor="#ffffff"][3]//text()') Date = source2.xpath('//tr[1]//td[@bgcolor="#ffffff"][4]//text()') Mp3filename = source2.xpath('//tr[1]//td[@colspan=2]//a[1]//@onClick') print Category, Book, Section, Date, Mp3filename

它看起來像lxml.html屬性名轉化爲小寫（在Python 2.7測試，從HTML的問題拷貝粘貼，沒有變化）：

raw= '''<tr> 
            <td bgcolor="#ffffff"><font face="Tahoma" size="2">Gemara</font></td> 
            <td bgcolor="#ffffff"><font face="Tahoma" size="2">Kiddushin</font></td> 
            <td bgcolor="#ffffff"><font face="Tahoma" size="2">Morning</font></td> 

            <td bgcolor="#ffffff"><font face="Tahoma" size="2">12-04-2104</font></td> 

            <td colspan=2 bgcolor="#ffffff" nowrap="nowrap"><font face="Tahoma" size="2"> 
            <a href="#" onClick="listen('05-115-08-2104-12-04.mp3')"><img src="images/play_audio.gif" border="0"></a> 
            <a href="#" onClick="mydownload('05-115-08-2104-12-04.mp3')"><img src="images/download.gif" border="0"></a> 
            </td> 
            <!-- <td bgcolor="#ffffff" nowrap="nowrap"><font face="Tahoma" Size="2"> 

            <a href="http://mgr.uvault.com/yadavraham/media//05-115-08-2104-12-04.mp3">Download</a> 
            </td> 
            --> 
            </tr>''' 

from lxml import html 
source2 = html.fromstring(raw) 

Mp3filename = source2.xpath('//tr[1]//td[@colspan=2]//a[1]') 
print html.tostring(Mp3filename[0]) 
# output : 
# <a href="#" onclick="listen('05-115-08-2104-12-04.mp3')"><img src="images/play_audio.gif" border="0"></a> 
#    ^notice that the attribute name changed to lower-case

所以我建議使用較低試試case @onclick在您的XPath中：

Mp3filename = source2.xpath('//tr[1]//td[@colspan=2]//a[1]/@onclick')

來源

2016-03-22 05:53:37 har07

非常棒的人，我從不知道lxml將屬性nodeset轉換爲小寫。這被稱爲有經驗的男人詳細的眼睛。我做到了這一點 - Mp3filename = source2.xpath（'// tr [1] // td [@ colspan = 2] // font [@face =「Tahoma」] // a [1] [@ href = 「＃」] // @ onclick'）; Mp3filename = str（Mp3filename [0]）。replace（「listen（'」，''）。replace（「'）」，''）。strip（） –

首先修復您的HTML，因此它是有效的xml。

你錯過了最後<td>的的結束標記。因此，XPath將不會在其下找到任何有效的xml。

來源

2016-03-22 05:55:46 wotanii

html.fromstring（source），因此不需要將html轉換爲任何xml模式。無論如何感謝您的建議。 –

XPath選擇不返回匹配

回答

相關問題