對此來計算多個xpath選擇器的正確方法是什麼？

非常疲倦的睡覺只有3小時，20小時醒來，原諒我的錯誤。對此來計算多個xpath選擇器的正確方法是什麼？

我想實現多個xpath選擇器，但似乎無法得到它，很明顯，這段代碼有一個缺陷代碼，重複說明，並最終取得最後一項的描述，並將其分配到所有項目，截圖和代碼：

顯示什麼，我意思是看到了一個可視化表示： this http://puu.sh/fBjA9/da85290fc2.png

代碼（Scrapy網絡爬蟲的Python）：蜘蛛

def parse(self, response): 
    item = DmozItem() 
    for sel in response.xpath("//td[@class='nblu tabcontent']"): 
     item['title'] = sel.xpath("a/big/text()").extract() 
     item['link'] = sel.xpath("a/@href").extract() 
     for sel in response.xpath("//td[contains(@class,'framed')]"): 
      item['description'] = sel.xpath("b/text()").extract()  
     yield item

管道

def process_item(self, item, spider): 
     self.cursor.execute("SELECT * FROM data WHERE title= %s", item['title']) 
     result = self.cursor.fetchall() 
     if result: 

      log.msg("Item already in database: %s" % item, level=log.DEBUG) 
     else: 
      self.cursor.execute(
       "INSERT INTO data(title, url, description) VALUES (%s, %s, %s)", 
        (item['title'][0], item['link'][0], item['description'][0])) 
      self.connection.commit() 

      log.msg("Item stored : " % item, level=log.DEBUG) 
     return item 

    def handle_error(self, e): 
      log.err(e)

感謝您閱讀並提供幫助。

來源

2015-02-07 CharlieC

的scrapy代碼不沒有看到的HTML多大的意義;有一個網址？ – 2015-02-07 18:39:30

@HughBothwell在這裏，謝謝。 http://www.phpclasses.org/browse/class/130.html – CharlieC 2015-02-07 18:41:53

@HughBothwell進入睡眠狀態，將在6小時內痊癒。接近24小時不睡覺 – CharlieC 2015-02-07 18:42:36

的問題是，"//td[@class='nblu tabcontent']"和"//td[contains(@class,'framed')]"是一個一一對應;你不能在另一個裏面迭代一個，或者像你發現的那樣，你只能從內部列表中獲得最後一個項目。

相反，嘗試

def parse(self, response): 
    title_links = response.xpath("//td[@class='nblu tabcontent']") 
    descriptions = response.xpath("//td[contains(@class,'framed')]") 
    for tl,d in zip(title_links, descriptions): 
     item = DmozItem() 
     item['title']  = tl.xpath("a/big/text()").extract() 
     item['link']  = tl.xpath("a/@href").extract() 
     item['description'] = d.xpath("b/text()").extract()  
     yield item

來源

2015-02-08 02:54:15

噢，我的，謝謝，確實工作。我想這只是錯誤的循環佈局。幫助非常感謝。 – CharlieC 2015-02-08 03:02:56

我認爲你只需要移動的項目實例裏面的for循環：

def parse(self, response): 
    for sel in response.xpath("//td[@class='nblu tabcontent']"): 
     item = DmozItem() 
     item['title'] = sel.xpath("a/big/text()").extract() 
     item['link'] = sel.xpath("a/@href").extract() 
     for sel in response.xpath("//td[contains(@class,'framed')]"): 
     item['description'] = sel.xpath("b/text()").extract()  
    yield item

來源

2015-02-07 23:43:33

嗯，仍然沒有效果。 – CharlieC 2015-02-08 02:21:05

嘗試使用// html作爲主響應.xpath 代碼：http：//hastebin.com/tinaduwezu.coffee 截圖（得到這個）：http://puu.sh/fCyVD/6707bc2d82.png 但會導致在一個MySQL錯誤 - ProgrammingError：並非所有參數都在SQL語句中使用 – CharlieC 2015-02-08 02:44:47

對此來計算多個xpath選擇器的正確方法是什麼？

回答

相關問題