我想在Python中將一個變量設置爲一個數組中的字符串元素,這是基於另一個數組中使用的字符串元素。我很難過如何去做。在python和scrapy中檢查另一個數組與另一個數組
這裏有兩個陣列:
genre = ["Dance",
"Festivals",
"Rock/pop"
]
我試圖基於在另一個陣列即這三個要素來打印類型時start_urls = [0],流派= [0]:
start_urls = [
"http://www.allgigs.co.uk/whats_on/London/clubbing-1.html",
"http://www.allgigs.co.uk/whats_on/London/festivals-1.html",
"http://www.allgigs.co.uk/whats_on/London/tours-1.html"
]
全碼:
genre = ["Dance",
"Festivals",
"Rock/pop"
]
class AllGigsSpider(CrawlSpider):
name = "allGigs" # Name of the Spider. In command promt, when in the correct folder, enter "scrapy crawl Allgigs".
allowed_domains = ["www.allgigs.co.uk"] # Allowed domains is a String NOT a URL.
start_urls = [
"http://www.allgigs.co.uk/whats_on/London/clubbing-1.html",
"http://www.allgigs.co.uk/whats_on/London/festivals-1.html",
"http://www.allgigs.co.uk/whats_on/London/tours-1.html"
]
rules = [
Rule(SgmlLinkExtractor(restrict_xpaths='//div[@class="more"]'), # Search the start URL's for
callback="parse_item",
follow=True),
]
def parse_start_url(self, response):
return self.parse_item(response)
def parse_item(self, response):#http://stackoverflow.com/questions/15836062/scrapy-crawlspider-doesnt-crawl-the-first-landing-page
for info in response.xpath('//div[@class="entry vevent"]'):
item = TutorialItem() # Extract items from the items folder.
item ['artist'] = info.xpath('.//span[@class="summary"]//text()').extract() # Extract artist information.
item ['date'] = info.xpath('.//span[@class="dates"]//text()').extract() # Extract date information.
preview = ''.join(str(s)for s in item['artist'])
#item ['genre'] = i.xpath('.//li[@class="style"]//text()').extract()
client = soundcloud.Client(client_id='401c04a7271e93baee8633483510e263', client_secret='b6a4c7ba613b157fe10e20735f5b58cc', callback='http://localhost:9000/#/callback.html')
tracks = client.get('/tracks', q = preview, limit=1)
for track in tracks:
print track.id
for i, val in enumerate(genre):
print '{} {}'.format(genre[i], start_urls[i])
print genre
#for i, val in enumerate(genre):
# print '{} {}'.format(genre[i], start_urls[i])
item ['trackz'] = track.id
yield item
任何幫助表示讚賞。
如果你想映射兩個數組你可以使用'dicts'? – Zero
把你的預期輸出\ – itzMEonTV
我的預期輸出只是將項目['流派']設置爲與被抓取的鏈接相對應的任何內容。所以第一個url只會發送一個字符串「跳舞」到我的數據庫 –