1
def parse_header(table):
ths = table.xpath('//tr/th')
if not ths:
ths = table.xpath('//tr[1]/td') # here is the problem, this will find tr[1]/td in all html file insted of this table
# bala bala something elese
doc = html.fromstring(html_string)
table = doc.xpath("//div[@id='divGridData']/div[2]/table")[0]
parse_header(table)
我想在我的表中找到所有tr[1]/td
,但table.xpath("//tr[1]/td")
仍然在html文件中找到所有。我如何才能找到這個元素而不是所有的html文件?用xpath查找表格元素中的所有tr?
編輯:
content = '''
<root>
<table id="table-one">
<tr>
<td>content from table 1</td>
<tr>
<table>
<tr>
<!-- this is content I do not want to get -->
<td>content from embeded table</td>
<tr>
</table>
</table>
</root>'''
root = etree.fromstring(content)
table_one = root.xpath('table[@id="table-one"]')
all_td_elements = table_one.xpath('//td') # so this give me too much!!!
現在我不想內嵌表的內容,我該怎麼辦呢?
我還有一個問題,我沒有更新我的問題,我怎麼能夠無視嵌入式表? – roger
我不明白更新? – gtlambert
我不想用'table_one.xpath('// td')' – roger