1
在Python,我有這樣得到的一個html
表元素的變量:無法獲取表頭元素
page = requests.get('http://www.myPage.com')
tree = html.fromstring(page.content)
table = tree.xpath('//table[@class="list"]')
的table
變量有這樣的內容:
<table class="list">
<tr>
<th>Date(s)</th>
<th>Sport</th>
<th>Event</th>
<th>Location</th>
</tr>
<tr>
<td>Jan 18-31</td>
<td>Tennis</td>
<td><a href="tennis-grand-slam/australian-open/index.htm">Australia Open</a></td>
<td>Melbourne, Australia</td>
</tr>
</table>
我想提取這樣的標題:
rows = iter(table)
headers = [col.text for col in next(rows)]
print "headers are: ", headers
但是,當我打印headers
變量我得到這個:
headers are: ['\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n
', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n
', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n
', '\n ', '\n ']
如何正確提取標題?
不能重現該問題://要點。 github.com/har07/c693eac57c79c2896881f9b6e2de2202)。你能發佈簡單但完整的代碼來重現這個問題嗎? – har07