我的Python Scrapy不能刮掉「關鍵字」內容

我無法scrapy「關鍵字」內容。 >」 < 我試過很多方法，但還是失敗了。我的Python Scrapy不能刮掉「關鍵字」內容

我已經成功地檢索到的其他內容，但還是未能獲得‘關鍵字’的內容。

誰能幫助解決這個錯誤？關鍵字內容位於「#keyword_table一」，或XPath 「// * [@ ID =」 keyword_table 「]/tbody的/ TR/TD [2]/A」的關鍵字內容的

圖片：

我的代碼：

import scrapy 
from bs4 import BeautifulSoup 
from digitimes.items import DigitimesItem 


class digitimesCrawler(scrapy.Spider): 
    name = 'digitimes' 
    start_urls = ["http://www.digitimes.com.tw/tw/dt/n/shwnws.asp?id=435000"] 


def parse(self, response): 
    soup = BeautifulSoup(response.body,'html.parser') 
    soupXml = BeautifulSoup(response.body, "lxml") 
    simpleList = [] 

    item = DigitimesItem() 

    timeSel=soup.select('.insubject .small') 
    tmpTime = timeSel[0].text 
    time = tmpTime[:10] 
    item['time'] = time #處理完時間啦 
    print(time) 

    titleSel = soup.select('title') 
    title = titleSel[0].text 
    item['title'] = title #處理完時間啦 
    print(title) 

    #================== To Resolve ================== 

    keywordOutput="" 
    for k in soupXml.select('#keyword_table a'): 
     for key in k: 
      keywordOutput = keywordOutput + key + " " 
    item['keyword'] = keywordOutput 
    print(keywordOutput) 

    #================== To Resolve ================== 



    categoryOutput="" 
    for m in soup.select('#sitemaptable tr td a'): 
     for cate in m: 
      if(cate!="DIGITIMES"): 
       categoryOutput = categoryOutput + cate + " " 
    item['cate'] = categoryOutput 
    print(categoryOutput) 

    simpleList.append(item) 
    return simpleList

來源

2016-07-26 Chiao Yun Chen

是否有您使用BeautifulSoup了scrapy選擇什麼特別的原因？您的方法收到的響應已經充當scrapy選擇器，可以執行xpath和css選擇。

表中似乎有3個關鍵字。您可以使用xpath或css選擇器來選擇它們：

response.css("#keyword_table a::text").extract() 
# or with xpath 
response.xpath("//*[@id='keyword_table']//a/text()").extract() 
# both return 
>>> [u'Sony', u'\u5f71\u50cf\u611f\u6e2c\u5668', u'\u80a1\u7968\u4ea4\u6613']

來源

2016-07-26 12:48:12 Granitosaurus

非常感謝！我學會了從網上課程中使用BeautifulSoup〜我會去scrapy中檢查原始選擇器~~謝謝！ –

我的Python Scrapy不能刮掉「關鍵字」內容

回答

相關問題