2015-04-12 82 views
0

我想在Python中將一個變量設置爲一個數組中的字符串元素,這是基於另一個數組中使用的字符串元素。我很難過如何去做。在python和scrapy中檢查另一個數組與另一個數組

這裏有兩個陣列:

genre = ["Dance", 
    "Festivals", 
    "Rock/pop" 
    ] 

我試圖基於在另一個陣列即這三個要素來打印類型時start_urls = [0],流派= [0]:

start_urls = [ 
    "http://www.allgigs.co.uk/whats_on/London/clubbing-1.html", 
    "http://www.allgigs.co.uk/whats_on/London/festivals-1.html", 
    "http://www.allgigs.co.uk/whats_on/London/tours-1.html" 
] 

全碼:

genre = ["Dance", 
    "Festivals", 
    "Rock/pop" 
    ] 

class AllGigsSpider(CrawlSpider): 
    name = "allGigs" # Name of the Spider. In command promt, when in the correct folder, enter "scrapy crawl Allgigs". 
    allowed_domains = ["www.allgigs.co.uk"] # Allowed domains is a String NOT a URL. 
    start_urls = [ 
     "http://www.allgigs.co.uk/whats_on/London/clubbing-1.html", 
     "http://www.allgigs.co.uk/whats_on/London/festivals-1.html", 
     "http://www.allgigs.co.uk/whats_on/London/tours-1.html" 
    ] 

    rules = [ 
     Rule(SgmlLinkExtractor(restrict_xpaths='//div[@class="more"]'), # Search the start URL's for 
     callback="parse_item", 
     follow=True), 
    ] 

    def parse_start_url(self, response): 
     return self.parse_item(response) 

    def parse_item(self, response):#http://stackoverflow.com/questions/15836062/scrapy-crawlspider-doesnt-crawl-the-first-landing-page 
     for info in response.xpath('//div[@class="entry vevent"]'): 
      item = TutorialItem() # Extract items from the items folder. 
      item ['artist'] = info.xpath('.//span[@class="summary"]//text()').extract() # Extract artist information. 
      item ['date'] = info.xpath('.//span[@class="dates"]//text()').extract() # Extract date information. 
      preview = ''.join(str(s)for s in item['artist']) 
      #item ['genre'] = i.xpath('.//li[@class="style"]//text()').extract() 
      client = soundcloud.Client(client_id='401c04a7271e93baee8633483510e263', client_secret='b6a4c7ba613b157fe10e20735f5b58cc', callback='http://localhost:9000/#/callback.html') 
      tracks = client.get('/tracks', q = preview, limit=1) 
      for track in tracks: 
       print track.id 
       for i, val in enumerate(genre): 
         print '{} {}'.format(genre[i], start_urls[i]) 
       print genre 
       #for i, val in enumerate(genre): 
       #  print '{} {}'.format(genre[i], start_urls[i]) 
       item ['trackz'] = track.id 
       yield item 

任何幫助表示讚賞。

+0

如果你想映射兩個數組你可以使用'dicts'? – Zero

+0

把你的預期輸出\ – itzMEonTV

+0

我的預期輸出只是將項目['流派']設置爲與被抓取的鏈接相對應的任何內容。所以第一個url只會發送一個字符串「跳舞」到我的數據庫 –

回答

0
for i, val in enumerate(genre): 
    print '{} {}'.format(genre[i], start_urls[i]) 

應該工作

+0

我得到一個關於全局變量'start_urls'不存在的錯誤。我將用完整的代碼編輯問題....並且謝謝:) –

+0

你的start_urls是一個類的屬性,所以你必須使用self,像這樣self.start_urls [i] –

+0

這真棒,工作更好。但是,這會打印出所有三種流派和三個網址。我只是想打印匹配被刮的網址的流派,如果這是有道理的? –

相關問題