0
我想提取從新聞網站與RSS訂閱項內容,如下面Scrapy:XPath的錯誤://媒體無效的表達式:內容
<item>
<title>BPS: Kartu Bansos Bantu Turunkan Angka Gini Ratio</title>
<media:content url="/image.jpg" expression="full" type="image/jpeg"/> </item>
但錯誤提出用時像媒體標籤解析信息: ( ':內容//媒體')
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/parsel/selector.py", line 183, in xpath
six.reraise(ValueError, ValueError(msg), sys.exc_info()[2])
File "/usr/local/lib/python2.7/site-packages/parsel/selector.py", line 179, in xpath
smart_strings=self._lxml_smart_strings)
File "src/lxml/lxml.etree.pyx", line 1587, in lxml.etree._Element.xpath (src/lxml/lxml.etree.c:57923)
File "src/lxml/xpath.pxi", line 307, in lxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:167084)
File "src/lxml/xpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:166043)
ValueError: XPath error: Undefined namespace prefix in //media:content
是否有人知道我應該怎麼辦使用XPath像item.xpath內容?謝謝:)
Scrapy選擇器的'.xpath()'不接受像'lxml'這樣的名稱空間參數(但是[開放PR](https://github.com/scrapy/parsel/)拉/ 45)在此)。必須事先在選擇器上調用['.register_namespace(prefix,namespace)'](https://parsel.readthedocs.io/en/latest/usage.html#parsel.selector.Selector.register_namespace)。 –
@paultrmbrth thx,我沒有意識到這不是lxml的xpath(),應該更近一點看堆棧跟蹤...感謝參考,我更正了我的回答 – mata
謝謝@mata,它的工作原理~~ – NGloom