導出我的數據導入CSV時，我的輸出是因爲標籤的可能障礙和空格

class Job(Item): 
    a_title = Field() 
    b_url = Field() 
    c_date = Field() 
    d_pub = Field() 

class stage(Spider): 
    name = 'jobs' 
    start_urls = ['http://www.stagiaire.com/offres-stages.html/'] 

    def parse(self, response): 

     for i in response.css('.info-offre'): 

      title = i.css('.titleads::text').extract() 
      url = i.css('.titleads::attr(href)').extract() 
      date = i.css('.date-offre.tip::text').extract() 
      pub = i.css('.content-1+ .content-1 .date-offre::text').extract() 

      yield Job(a_title=title, b_url=url, c_date=date, d_pub=pub)

this my output 導出我的數據導入CSV時，我的輸出是因爲標籤的可能障礙和空格

來源

2016-08-23 K. ossama

將所有代碼包裝在標籤中 – dbmitch

您可以發佈csv文本源代碼而不是圖片嗎？ – Granitosaurus

既然你不使用scrapy ItemLoader的你把具體的名單，你的結果，你很可能希望單個元素。爲了解決這個問題使用extract_first()代替extract()只得到了第一個XPath的選擇。

你的情況應該是：

title = i.css('.titleads::text').extract_first('') # defaults to '' 
url = i.css('.titleads::attr(href)').extract_first('').strip() # get rid of spaces and /n etc. 
date = i.css('.date-offre.tip::text').extract_first('') 
pub = i.css('.content-1+ .content-1 .date-offre::text').extract_first('')

其實好像你要使用ItemLoader這裏清理換行符等

from scrapy.loader import ItemLoader 
from scrapy import Item, Field 
from scrapy.loader.processors import Compose, TakeFirst 

class MyItem(scrapy.Item): 
    title = Field() 

class MyItemLoader(ItemLoader): 
    default_item_class = MyItem 
    # this will process every field in the item, take first element and remove all newlines and trailing spaces 
    default_output_processor = Compose(TakeFirst(), 
             lambda v: v.replace('\n','').strip()) # get rid of new lines

的各個領域，這可能看起來很多，但項目加載器只是包含項目對象的包裝器，它可以在您將值放入或取出時執行某些操作。在上面的例子中，它將處理所有的值，如果它是一個列表並且移除任何換行符，則取第一個元素。

然後，當剛剛建立在一些領域裝載機和負載！

loader = MyItemLoader(selector=response) 
loader.add_css('title', '.titleads::text') 
loader.add_css('url', '.titleads::attr(href)') 
loader.add_css('date', '.date-offre.tip::text') 
loader.add_css('pub', '.content-1+ .content-1 .date-offre::text') 
return loader.load_item()

來源

2016-08-23 21:10:46 Granitosaurus

謝謝你的回答，但它似乎不適合我的代碼。它提供了一個exceptions.AttributeError：'Selectorlist'沒有屬性'extract_first'。我錯過了什麼！導入一個方法可能？您knawledge當我提取前兩個項目（標題，URL）是沒有問題的，但是當我在最後兩個項目（日期酒館）添加到我的代碼，它給出不orginized文件！另一個信息可以幫助，當我使用熊貓創建數據框時，我意識到在這兩個（日期和酒吧）中都有很多元字符\ n \ t \ t \ t。謝謝你提前 –

@ K.ossama'extract_first（）'只在scrapy V1.1增加，所以你可能只需要更新您的scrapy。爲了擺脫反斜槓字符，你可以「剝離」結果（見我的編輯）。如果您有任何其他問題，您應該以文本格式而不是截圖提供結果，那麼我們可以確切地看到有什麼問題。 – Granitosaurus

感謝@Granitosaurus我已經更新了我的scrapy但仍然給了同樣的問題：你可以在這裏找到我的輸出，並再次感謝：https://1drv.ms/u/s!Ah5DCQ19IxysgQLtVn7JhWHDpuaT –

導出我的數據導入CSV時，我的輸出是因爲標籤的可能障礙和空格

回答

相關問題