2017-05-07 32 views
2

我刮下的列表頁網站的詳細信息頁面,在每個細節頁面一定的差異分析不同的詳細信息頁面。Scrapy從上市

1日詳細頁面:

<div class="td-post-content"> 
    <p style="text-align: justify;"> 
     <strong>[ Karda Natam ]</strong> 
     <br> 
     <strong>ITANAGAR, May 6:</strong> Nacho, Taksing, Siyum and ... 
     <br> 「Offices are without ... 
    </p> 
</div> 

第二詳細頁面:

<div class="td-post-content"> 
    <p style="text-align: justify;"> 
     <strong>Guwahati, May 6 (PTI)</strong> Sarbananda Sonowal today ... 
     <br> 「Books are a potent tool to create ... 
    </p> 
</div> 

第三詳細頁面:

<div class="td-post-content"> 
    <h3 style="text-align: justify;"><strong>Flights Of Fantasy</strong></h3> 
    <p style="text-align: justify;"> 
     <strong>[ M Panging ]</strong> 
     <br> This state of denial ... 
    </p> 
</div> 

我試圖從細節解析作者和發佈日期頁碼:

class ArunachaltimesSpider(scrapy.Spider): 
    ... 
    ... 

    def parse(self, response): 
     urls = response.css("div.td-ss-main-content > div.td_module_16 > div.item-details > h3.entry-title > a::attr(href)").extract() 
     for url in urls: 
      yield scrapy.Request(url=url, callback=self.parse_detail) 

     next = response.xpath("// ...')]/@href").extract_first() 
     if next: 
      yield scrapy.Request(url=next, callback=self.parse) 

    def parse_detail(self, response): 
     strong_elements = response.css("div.td-ss-main-content").css("div.td-post-content").css("p > strong::text").extract() 
     for strong in strong_elements: 
      if ', ' in strong: 
       news_date = strong.split(', ')[1].replace(":", "") 
      elif '[ ' and ' ]' in strong: 
       author = strong 
      else: 
       news_date = None 
       author = None 
     yield { 
      'author': author, 
      'news_date': news_date 
     } 

但我收到此錯誤:

UnboundLocalError: local variable 'author' referenced before assignment

我在做什麼錯在這裏?您能否請分別從每個頁面獲取作者和新聞日期。謝謝。

回答

0

貌似strong_elements你的情況空數組。所以for循環不運行。但是你宣佈在for循環author變量,你在未申報的產量使用author(becoz for循環不運行)你的情況。宣佈author頂級變量如上for循環