2017-05-07 63 views
0

我從列表頁面中抓取網站的詳細信息頁面,每個詳細信息頁面都有一些不同。UnboundLocalError:分配前引用的本地變量'作者'

1日詳細頁面:

<div class="td-post-content"> 
    <p style="text-align: justify;"> 
     <strong>[ Karda Natam ]</strong> 
     <br> 
     <strong>ITANAGAR, May 6:</strong> Nacho, Taksing, Siyum and ... 
     <br> 「Offices are without ... 
    </p> 
</div> 

第二詳細頁面:

<div class="td-post-content"> 
    <p style="text-align: justify;"> 
     <strong>Guwahati, May 6 (PTI)</strong> Sarbananda Sonowal today ... 
     <br> 「Books are a potent tool to create ... 
    </p> 
</div> 

我試圖解析從詳細信息頁面的作者和發佈日期:

class ArunachaltimesSpider(scrapy.Spider): 
    ... 
    ... 

    def parse(self, response): 
     urls = response.css("...").extract() 
     for url in urls: 
      yield scrapy.Request(url=url, callback=self.parse_detail) 

    def parse_detail(self, response): 
     strong_elements = response.css("div.td-ss-main-content").css("div.td-post-content").css("p > strong::text").extract() 
     for strong in strong_elements: 
      if ', ' in strong: 
       news_date = strong.split(', ')[1].replace(":", "") 
      elif '[ ' and ' ]' in strong: 
       author = strong 
      else: 
       news_date = None 
       author = None 
     yield { 
      'author': author, 
      'news_date': news_date 
     } 

但我我得到這個錯誤:

UnboundLocalError: local variable 'author' referenced before assignment

我在這裏做錯了什麼?您能否請分別從每個頁面獲取作者和新聞日期。謝謝。

+1

沒有值被分配給'author'如果不執行循環體,或者如果只有第一'if'分支被採取。 – jiakai

+0

@jiakai是的,通過給作者和新聞日期提供默認的None值來解決問題。 – Robin

回答

0

問題解決了,通過向兩個authornews_date提供缺省值None

def parse_detail(self, response): 
    strong_elements = response.css("div.td-ss-main-content").css("div.td-post-content").css("p > strong::text").extract() 
    author = None 
    news_date = None 
    for strong in strong_elements: 
     if ', ' in strong: 
      news_date = strong.split(", ")[1].replace(":", "").split(" (")[0] 
     elif '[ ' and ' ]' in strong: 
      author = strong.strip("[ ").strip(" ]") 
     else: 
      news_date = None 
      author = None 
    yield { 
     'author': author, 
     'news_date': news_date 
    } 
相關問題