我試圖刮下面的HTML代碼的標題:是否有scrapy跟隨同胞計數?
<FONT COLOR=#5FA505><B>Claim:</B></FONT> Coed makes unintentionally risqué remark about professor's "little quizzies."
<BR><BR>
<CENTER><IMG SRC="/images/content-divider.gif"></CENTER>
我使用這個代碼:
def parse_article(self, response):
for href in response.xpath('//font[b = "Claim:"]/following-sibling::text()'):
print href.extract()
,我成功地拉了正確的Claim:
值,我從想前面提到過的html代碼,但是也有(在同一頁面中具有類似結構的其他代碼)拉下面的html。我正在定義我的xpath()
只需拉入名爲Claim:
的font
標記,那麼爲什麼它也拉動下面的Origins
?我該如何解決它?我想看到的,如果我能得到的只是下一個following-sibling
,而不是所有的人,但沒有奏效
<FONT COLOR=#5FA505 FACE=""><B>Origins:</B></FONT> Print references to the "little quizzies" tale date to 1962, but the tale itself has been around since the early 1950s. It continues to surface among college students to this day. Similar to a number of other college legends
'.extract()[0]' –
@JohnDene我的輸出變化,但它只是一堆空的空間,偶爾會出現','每隔一段時間 – Rafa
我認爲這是您正在使用for循環的bcoz。如果我知道它是正確的,你只想提取一個值? –