1
使用Scrapy 0.24 Selectors,我想提取的段落內容,包括其他元素的含量(以下爲例,它會是錨<a>
其他元素的內容。我怎樣才能做到這一點?提取段落文本包括使用Scrapy選擇
守則
>>> from scrapy import Selector
>>> html = """
<html>
<head>
<title>Test</title>
</head>
<body>
<div>
<p>Hello, can I get this paragraph content without this <a href="http://google.com">Google link</a>?
</div>
</body>
</html>
"""
>>> sel = Selector(text=html, type="html")
>>> sel.xpath('//p/text()').extract()
[u'Hello, can I get this paragraph content with this ', u'?']
輸出
[u'Hello, can I get this paragraph content with this ', u'?']
預期輸出
[u'Hello, can I get this paragraph content with this Google link?']
嗯。你可以首先提取' 2015-01-26 23:25:44