考慮頁面變量中的html。XPATH - 有很多孩子的html
如何訪問td s?
我想訪問他們喜歡xpath("/table/tr/td/text())"
我不想指明其他TR小號
不幸的是這表達xpath('.//table/tr/tr/tr/td/text()')
也不管用。
Python代碼:
import __future__
from lxml import html
import requests
from bs4 import BeautifulSoup
page = """
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>cv</title>
</head>
<body>
<table>
<tr>
<tr>
<tr>
<td>table1 td1</td>
<td>table1 td2</td>
</tr>
</tr>
</tr>
</table>
<table>
<tr>
<tr>
<tr>
<td>table2 td1</td>
<td>table2 td2</td>
</tr>
</tr>
</tr>
</table>
<table>
<tr>
<tr>
<tr>
<td>table3 td1</td>
<td>table3 td2</td>
</tr>
</tr>
</tr>
</table>
</body>
</html>
"""
soup = str(BeautifulSoup(page, 'html.parser'))
tree = html.fromstring(soup)
things = tree.xpath('.//table/tr/tr/tr/td/text()')
print(things)
for thing in things:
print(thing)
print('That's all')
我想從根源!
不幫我,我希望它形成根!!! 原因隨後我會從每個表索引訪問tds,如: 'xpath(「/ table [1]/tr/td/text()」)' –
'xpath(「/ table [1] // td/text()「)' –
@ hr_117好吧,如果輸出按每個表分組,那麼我們每個表都會執行xpath。請參閱擴展答案。 –