從lxml解析html中的日期字符串

s = """ 
     <tbody> 
     <tr> 
     <td style="border-bottom: none"> 
     <span class="graytext" style="font-weight: bold;"> Reply #3 - </span> 
     <span class="graytext" style="font-size: 11px"> 
     05/13/09 2:02am 
     <br> 
     </span> 
     </td> 
    </tr> 
    </tbody> 
"""

在HTML字符串中，我需要取出日期字符串。從lxml解析html中的日期字符串

我試圖以這種方式

import lxml 
    doc = lxml.html.fromstring(s) 
    doc.xpath('//span[@class="graytext" and @style="font-size: 11px"]')

但是，這是行不通的。我應該只需要使用日期字符串。

來源

2012-06-14 Nava

您的查詢選擇span，你需要從中抓取文本：

>>> doc.xpath('//span[@class="graytext" and @style="font-size: 11px"]') 
[<Element span at 1c9d4c8>]

大多數查詢返回一個序列中，我通常使用一個輔助函數，得到的第一個項目。

然後：

>>> doc.xpath('//span[@class="graytext" and @style="font-size: 11px"]') 
[<Element span at 1c9d4c8>] 
>>> doc.xpath('//span[@class="graytext" and @style="font-size: 11px"]/text()') 
['\n 05/13/09 2:02am\n '] 
>>> first(doc.xpath('//span[@class="graytext" and @style="font-size: 11px"]/text()'),'').strip() 
'05/13/09 2:02am'

來源

2012-06-14 13:45:08 MattH

請嘗試以下，而不是最後一行：

print doc.xpath('//span[@class="graytext" and @style="font-size: 11px"]/text()')[0]

XPath表達式的第一部分是正確的，//span[@class="graytext" and @style="font-size: 11px"]選擇所有匹配跨度的節點，然後你需要指定要從節點選擇什麼。這裏使用的text()選擇節點的內容。

來源

2012-06-14 13:44:23

從lxml解析html中的日期字符串

回答

相關問題