至少有幾種方法可以做到這一點:
讓我們先建立一個測試選擇模仿你的迴應:
>>> response = scrapy.Selector(text="""<li>
... <a test="test" href="abc.html" id="11">Click Here</a>
... "for further reference"
... </li>""")
第一種選擇,對於小的修改,以你的CSS選擇器。 看所有文字後裔,不僅文字兒童(注意li
和::text
僞元素之間的空間):
# this is your CSS select,
# which only gives direct children text of your selected LI
>>> response.css("li::text").extract()
[u'\n ', u'\n "for further reference"\n']
# notice the extra space
# here
# |
# v
>>> response.css("li ::text").extract()
[u'\n ', u'Click Here', u'\n "for further reference"\n']
# using Python's join() to concatenate and build the full sentence
>>> ''.join(response.css("li ::text").extract())
u'\n Click Here\n "for further reference"\n'
另一種選擇是鏈中的.css()
通話使用XPath 1.0 string()
或normalize-space()
後續.xpath()
調用內部:
>>> response.css("li").xpath('string()').extract()
[u'\n Click Here\n "for further reference"\n']
>>> response.css("li").xpath('normalize-space()').extract()
[u'Click Here "for further reference"']
# calling `.extract_first()` gives you a string directly, not a list of 1 string
>>> response.css("li").xpath('normalize-space()').extract_first()
u'Click Here "for further reference"'
你可以嘗試的XPath選擇'response.xpath( '//文/ DIV [@ ID = 「部分2」] /李//文本()')。提取物()' – vold