2012-05-17 56 views
1

解析一個HTML文檔時,我有一個weired問題。 HTML文檔的跨度是這樣的:如何獲取lxml解析中的確切日期?

<span class="time">Thu May 17, 2012 12:20 pm</span> 

當我解析它(這是一個TD內):

row.xpath('string(./td/span/text())') 

我得到如下:

Wed May 16, 2012 11:20 pm 

可能是什麼問題?

回答

1

也許./td/span匹配多個元素。當你把string()中的XPath,只有第一個結果得到處理:

>>> html = """<html> 
...    <td><span class="time">Wed May 16, 2012 11:20 pm</span></td> 
...    <td><span class="time">Thu May 17, 2012 12:20 pm</span></td> 
...   </html>""" 
>>> t = etree.fromstring(html) 
>>> t.xpath('string(./td/span)') 
'Wed May 16, 2012 11:20 pm' 

你應該寫一個更具體的XPath在行得到你想要的行或循環:

>>> for row in t.xpath("./td/span"): 
...  print(row.xpath("string(.)")) 
...  
Wed May 16, 2012 11:20 pm 
Thu May 17, 2012 12:20 pm 

(注:我已經刪除了text(),因爲這不是在這種情況下,需要text()might not do what you think it does

+0

是的,我遍歷行,我也檢查,如果該行是一樣的 – wasimbhalli

+0

我甚至嘗試過python提示符如下:rows [3] .xpath('./ td/span/text()')但仍顯示以前的日期 – wasimbhalli

+0

@wasimbhalli:那麼您可能使用了錯誤的XPath expr來獲取行。 'rows [3] .xpath(「./ td/span」)'返回多少項? –