無法獲取表頭元素

在Python，我有這樣得到的一個html表元素的變量：無法獲取表頭元素

page = requests.get('http://www.myPage.com') 
tree = html.fromstring(page.content) 
table = tree.xpath('//table[@class="list"]')

的table變量有這樣的內容：

<table class="list"> 
     <tr> 
     <th>Date(s)</th> 
     <th>Sport</th> 
     <th>Event</th> 
     <th>Location</th> 
     </tr> 
     <tr> 
     <td>Jan 18-31</td> 
     <td>Tennis</td> 
     <td><a href="tennis-grand-slam/australian-open/index.htm">Australia Open</a></td> 
     <td>Melbourne, Australia</td> 
     </tr> 
</table>

我想提取這樣的標題：

rows = iter(table) 
headers = [col.text for col in next(rows)] 
print "headers are: ", headers

但是，當我打印headers變量我得到這個：

headers are: ['\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n 
     ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n 
', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n  ', '\n 
     ', '\n  ', '\n  ']

如何正確提取標題？

來源

2016-04-26 octavian

不能重現該問題：//要點。 github.com/har07/c693eac57c79c2896881f9b6e2de2202）。你能發佈簡單但完整的代碼來重現這個問題嗎？ – har07

試試這個：

from lxml import html 

HTML_CODE = """<table class="list"> 
     <tr> 
     <th>Date(s)</th> 
     <th>Sport</th> 
     <th>Event</th> 
     <th>Location</th> 
     </tr> 
     <tr> 
     <td>Jan 18-31</td> 
     <td>Tennis</td> 
     <td><a href="tennis-grand-slam/australian-open/index.htm">Australia Open</a></td> 
     <td>Melbourne, Australia</td> 
     </tr> 
</table>""" 

tree = html.fromstring(HTML_CODE) 
headers = tree.xpath('//table[@class="list"]/tr/th/text()') 
print "Headers are: {}".format(', '.join(headers))

輸出：

Headers are: Date(s), Sport, Event, Location

來源

2016-04-26 13:14:42

使用表，假設只有一個：

table[0].xpath("//th/text()")

或者，如果你只是想來自表格的標題和做使用它，沒什麼別的打算，你只需要：

都將給您：

使用[驗證碼]（HTTPS

['Date(s)', 'Sport', 'Event', 'Location']

來源

2016-04-26 14:33:13

無法獲取表頭元素

回答

相關問題