2013-07-10 74 views
1

我想刮一個活動網站,我有附加的代碼來刮事件名稱和位置。我將輸出寫入一個csv文件,然後csv文件將所有事件名稱在一行中相互追加。Scrapy csv輸出沒有每列多行

例如,假設我有兩件事布魯諾火星和栗色5,他們的位置爲聖何塞,聖克拉拉。電流輸出,

EVENT_NAME event_location

布魯諾·馬爾斯,Maroon 5的聖何塞,聖克拉拉

,但我希望看到的,

EVENT_NAME event_location

布魯諾·馬爾斯聖何塞

Maroon 5 Santa Clara。

有人可以讓我知道爲什麼這種格式越來越奇怪嗎?我在這裏附上了代碼。然後我使用scrapy crawl event_spider -o output.csv -t csv來運行我的代碼。

from scrapy.spider import BaseSpider 
from scrapy.selector import HtmlXPathSelector 

from event_test.items import EventItem 


class EventSpider(BaseSpider): 
    name = "event_spider" 
    allowed_domains = ["eventful.com"] 
    start_urls = [ 
     "http://eventful.com/sanjose/events" 
    ] 

    def parse(self, response): 
    hxs = HtmlXPathSelector(response) 
    events = hxs.select("/html/body[@id='events']/div[@id='outer-container']/div[@id='mid-container']/div[@id='inner-container']/div[@id='content']/div[@class='cols-2-1']/div[@class='alpha']/div[@id='top-events']/div[@class='section top-events cage-dbl-border cage-bdr-mdgrey']/div[@id='events-scroll']/div[@id='events-scroll-items']/ul[@id='events-scroll-items-list']/li[@class='top-events-item ']") 
    items = [] 
    for event in events: 
     item = EventItem() 
     item['event_name'] = event.select("//h2/a/span/text()").extract() 
     item['event_locality'] = event.select("//span[@class='locality']/text()").extract() 
     items.append(item) 
    return items 

回答

0

我已經簡化了代碼和XPath的在你的蜘蛛:

from scrapy.spider import BaseSpider 
from scrapy.selector import HtmlXPathSelector 
from event_test.items import EventItem 


class EventSpider(BaseSpider): 
    name = "event_spider" 
    allowed_domains = ["eventful.com"] 
    start_urls = ["http://eventful.com/sanjose/events"] 

    def parse(self, response): 
     hxs = HtmlXPathSelector(response) 
     events = hxs.select("//li[contains(@class, 'top-events-item')]") 
     for event in events: 
      item = EventItem() 
      item['event_name'] = event.select(".//h2/a/span/text()").extract()[0] 
      item['event_locality'] = event.select(".//span[@class='locality']/text()").extract()[0] 
      yield item 

這裏就是你會在CSV文件獲取:

event_name,event_locality 
Under the Influence of Music Tour,Mountain View 
Bruno Mars,San Jose 
John Mayer: Born & Raised Tour 2013,Mountain View 
New Kids on the Block with 98 Degrees and ...,San Jose 
Amy Grant,San Jose 
Styx,Saratoga 
Bob Dylan with Wilco,Mountain View 
Kenny Chesney with Eli Young Band,Mountain View 
Smash Mouth \/ Sugar Ray \/ Gin Blossoms \...,Saratoga 
Creedence Clearwater Revisited \/ 38 Special,Saratoga 

希望有所幫助。

+0

非常好,謝謝!它工作完美。 – AJay