使用Scrapy 0.24 Selectors，我想提取的段落內容，包括其他元素的含量（以下爲例，它會是錨<a>其他元素的內容。我怎樣才能做到這一點？提取段落文本包括使用Scrapy選擇

守則

>>> from scrapy import Selector 
>>> html = """ 
     <html> 
      <head> 
       <title>Test</title> 
      </head> 
      <body> 
       <div> 
        <p>Hello, can I get this paragraph content without this <a href="http://google.com">Google link</a>? 
       </div> 
      </body> 
     </html> 
     """ 
>>> sel = Selector(text=html, type="html") 
>>> sel.xpath('//p/text()').extract() 
[u'Hello, can I get this paragraph content with this ', u'?']

輸出

[u'Hello, can I get this paragraph content with this ', u'?']

預期輸出

[u'Hello, can I get this paragraph content with this Google link?']

來源

2015-01-26 Doon

嗯。你可以首先提取' 2015-01-26 23:25:44

文件：http://www.crummy.com/software/BeautifulSoup/bs4/doc/

安裝：pip install beautifulsoup4

對於您的情況：

# 'html' is the one your provided 
from bs4 import BeautifulSoup 
soup = BeautifulSoup(html) 
res = [p.get_text().strip() for p in soup.find_all('p')]

結果：

[u'Hello, can I get this paragraph content without this Google link?']

來源

2015-01-27 03:55:38 ZZY

提取段落文本包括使用Scrapy選擇

守則

預期輸出

回答

相關問題