2016-03-09 122 views
1

的URL如何python.please幫助me.this使用scrapy來從網站圖片的網址是我的代碼scrapy提取圖像

from scrapy.spiders import CrawlSpider, Rule 
#from scrapy.linkextractors.lxmlhtml import LxmlLinkExtractor 
from scrapy.contrib.linkextractors import LinkExtractor 
from scrapy.item import Item, Field 

class MyItem(Item): 
    url= Field() 


class someSpider(CrawlSpider): 
    name = 'crawltest' 
    allowed_domains = ['bambeeq.com'] 
    start_urls = ['http://www.bambeeq.com/'] 
    rules = (Rule(LinkExtractor(allow=()), callback='parse_obj', follow=True),) 

    def parse_obj(self,response): 
     item = MyItem() 
     item['url'] = [] 
     for link in LinkExtractor(allow=(),deny = self.allowed_domains).extract_links(response): 
      item['url'].append(link.url) 
      #item['image'].append(link.img) 
     return item 
+0

問題尋求幫助調試(**「爲什麼不是這個代碼的工作?」 **)必須包括所期望的行爲,*一個特定的問題或錯誤*和*必要最短的代碼*到在問題本身**中重現它**。沒有**明確問題陳述**的問題對其他讀者沒有用處。請參閱:[如何創建最小,完整和可驗證示例](http://stackoverflow.com/help/mcve)。 – MattDMo

回答

2

要解壓縮的鏈接(「一」元素),而不是圖像('img'元素)。試試這個:

# iterate over the list of images 
for image in response.xpath('//img/@src').extract(): 
    # make each one into a full URL and add to item[] 
    item['url'].append(response.urljoin(image)) 

yield item