2014-02-13 67 views
0

我正在導入一個正在解析的CSV文件,並且我將分隔符設置爲「|」我想刪除逗號或評論他們,讓他們不要弄亂colums在Ruby中解析CSV之前或期間刪除逗號

這裏是我認爲應該有一個代碼,以去除部分,的

namespace :postonce do 
    desc "Check postonce ftp files and post loads and trucks." 
    task :post => :environment do 
    files = %x[ls /home/web2_postonce/].split("\n") 
    files.each do |file| 
     %x[ iconv -t UTF-8 /home/web2_postonce/#{file} > /home/deployer/postonce/#{file} ] 
     %x[ mv /home/web2_postonce/#{file} /home/deployer/postonce_backup/ ] 
    end 
    files = %x[ ls /home/deployer/postonce/ ].split("\n") 
    files.each do |file| 
     begin 
     lines = CSV.read("/home/deployer/postonce/#{file}") 
     rescue Exception => e 
     log.error e 
     next 
     end 
     h = lines.shift 
     header = CSV.parse_line(h[0], { :col_sep => "|" }) 
     lines.each do |line| 
     fields = CSV.parse_line(line[0],{:col_sep => "|"}) 
     post = Hash[header.zip fields] 
    if post["EmailAddress"].blank? 
     log.error "Blank Email #{post["EmailAddress"]}" 
     else 
     log.debug "Email #{post["EmailAddress"]}" 
     end 

這裏是全拉動該文件,並解析文件到colums

require 'resque' 
require 'logger' 
log = Logger.new("#{Rails.root}/log/PostOnce.log") 
log.datetime_format = "%F %T" 
namespace :postonce do 
    desc "Check postonce ftp files and post loads and trucks." 
    task :post => :environment do 
    files = %x[ls /home/web2_postonce/].split("\n") 
    files.each do |file| 
     %x[ iconv -t UTF-8 /home/web2_postonce/#{file} > /home/deployer/postonce/#{file} ] 
     %x[ mv /home/web2_postonce/#{file} /home/deployer/postonce_backup/ ] 
    end 
    files = %x[ ls /home/deployer/postonce/ ].split("\n") 
    files.each do |file| 
     begin 
     lines = CSV.read("/home/deployer/postonce/#{file}") 
     rescue Exception => e 
     log.error e 
     next 
     end 
     h = lines.shift 
     header = CSV.parse_line(h[0], { :col_sep => "|" }) 
     lines.each do |line| 
     fields = CSV.parse_line(line[0],{:col_sep => "|"}) 
     post = Hash[header.zip fields] 
    if post["EmailAddress"].blank? 
     log.error "Blank Email #{post["EmailAddress"]}" 
     else 
     log.error "Email #{post["EmailAddress"]}" 
     end 
     if post["Notes"].blank? 
      post["Notes"] = "~PostOnce~" 
     else 
      post["Notes"] = post["Notes"]+" ~PostOnce~" 
     end 
     if Company.where(:name => post["Company"]).first.nil? 
      c = Company.new 
      c.name = post["Company"] 
      c.dispatch = post["Customer_Phone"] 
      c.save 
     end 
     if User.where(:email => ["EmailAddress"]).first.blank? 
      u = User.new 
      c = Company.where(:name => post["Company"]).first unless Company.where(:name => post["Company"]).first.nil? 
      u.company_id = c.id 
      u.username = post["EmailAddress"].gsub(/@.*/,"") unless post["EmailAddress"].nil? 
      u.password = Time.now.to_s 
      u.email = post["EmailAddress"] 
      u.dispatch = post["Customer_Phone"] 
      u.save 
     end 
     #If Load 
     if file.start_with?("PO_loads") 
      record = Hash.new 
      begin 
      record[:user_id] = User.where(:email => post["EmailAddress"]).first.id 
      rescue Exception => e 
      log.error e 
      next 
      end 
      record[:origin] = "#{post["Starting_City"]}, #{post["Starting_State"]}" 
      record[:dest] = "#{post["Destination_City"]}, #{post["Destination_State"]}" 
      record[:pickup] = Time.parse(post["Pickup_Date_Time"]) 
      record[:ltl] = false 
      record[:ltl] = true unless post["#Load_Type_Full"] = "FULL" 
      begin 
      record[:equipment_id] = Equipment.where(:code => post["Type_of_Equipment"]).first.id 
      rescue Exception => e 
      record[:equipment_id] = 34 
      end 
      record[:comments] = post["Notes"] 
      record[:weight] = post["Weight"] 
      record[:length] = post["Length"] 
      record[:rate] = post["Payment_amount"] 
      record[:rate] = '' if post["Payment_amount"] == 'Call' or post["Payment_amount"] == 'CALL' 
      Resque.enqueue(MajorPoster, record) 
     #If Truck 
     elsif file.start_with?("PO_trucks") 
      record = Hash.new 
      begin 
      record[:user_id] = User.where(:email => post["EmailAddress"]).first.id 
      rescue Exception => e 
      log.error e 
      next 
      end 
      record[:origin] = "#{post["Starting_City"]}, #{post["Starting_State"]}" 
      record[:dest] = "#{post["Destination_City"]}, #{post["Destination_State"]}" 
      record[:available] = Time.parse(post["Pickup_Date_Time"]) 
      record[:expiration] = record[:available] + 8.days 
      begin 
      record[:equipment_id] = Equipment.where(:code => post["Type_of_Equipment"]).first.id 
      rescue Exception => e 
      record[:equipment_id] = 34 
      end 
      record[:comments] = post["Notes"] 
      Resque.enqueue(MajorPoster, record) 
     end 
     end 
    # %x[rm /home/deployer/postonce/#{file}] 
    end 
    end 
end 

這裏的代碼是我特林加載進來Customer_Contact和Notes中的逗號數據的樣本此數據來給我們通FTP

Member_ID|Action_type|Entry_Number|Pickup_Date_Time|Starting_City|Starting_State|Destination_City|Destination_State|Type_of_Equipment|Length|Quantity|#Load_type_full|Extra_Stops|Payment_amount|Weight|Distance|Notes|Customer_Phone|Extension|Customer_Contact|EmailAddress|Company| 
SUMMIT|L-delete|16491978|20140213|PEWAMO|MI|DENVER|CO|FT|45|1|FULL|0|Call|46000|||866-807-4968||DISPATCH, Dispatch|[email protected]|SUMMIT TRANSPORTATION SERVICES INC.| 
SUMMIT|L-delete|16490693|20140213|PEWAMO|MI|DENVER|CO|V|48|1|FULL|0|Call|44000|||866-807-4968||DISPATCH|[email protected]|SUMMIT TRANSPORTATION SERVICES INC.| 
SUMMIT|L-delete|16490699|20140214|PEWAMO|MI|DENVER|CO|V|48|1|FULL|0|Call|44000|||866-807-4968||DISPATCH|[email protected]|SUMMIT TRANSPORTATION SERVICES INC.| 
megacorpwv|L-Delete|16491928|20140214|WAITE PARK|MN|DOLTON|IL|R||1|FULL|0|CALL|0|0|(859) 538-1660 x2007|877-670-2837|||[email protected]|MEGACORP LOGISTICS 03| 

我的日誌顯示此:正如你看到的我手動把一個逗號在一個領域上的第一個記錄,它充當一個分隔符

2014-02-13 12:29:41 ERROR -- Blank Email 
2014-02-13 12:29:41 ERROR -- undefined method `id' for nil:NilClass 
2014-02-13 12:29:41 DEBUG -- Email [email protected] 
2014-02-13 12:29:42 DEBUG -- Email [email protected] 
2014-02-13 12:29:42 DEBUG -- Email [email protected] 
+1

試圖分隔符,所以這是件好事。但是你能否提供更多樣本數據和預期產出? –

+0

如果你的CSV是| -delimited,爲什麼你不能在最開始時執行'CSV.read(「file」,:col_sep =>'|')'? –

+0

我已更新發布 –

回答

0

我覺得你的問題是,你只解析數組「h」和「line」的第一個元素。嘗試從這兩行刪除「[0]」。這並不是說電子郵件是空白的,除了Member_ID之外的所有內容都是空白的。

header = CSV.parse_line(h, { :col_sep => "|" }) 
lines.each do |line| 
fields = CSV.parse_line(line,{:col_sep => "|"}) 

啊。好。 Phillip Hallstrom發現了這個問題。它在CSV.read語句中。默認情況下,CSV.read將嘗試用逗號「,」分隔。 CSV.read試圖做的是將每行讀取爲一個數組元素,然後將每行解析到另一個數組中。因此,如果你的文件看起來是這樣的:

a|b|c|d|e 
apple|ball, bearing|cantelope|date|elephant 

這將返回以下陣列上CSV.read

[["a|b|c|d|e"], ["apple|ball", " bearing|cantelope|date|elephant"]] 

你可以看到,CSV.read嘗試你之前做充分的解析有機會指定一個分隔符。

無論是讀取線使用正常的文件I/O或重新編碼到指定的CSV.read聲明

+0

我添加了我的完整代碼發佈,所以你可以看到我試圖解析一切只是tring調試它 –

+0

它完美的作品沒有逗號,但與逗號它錯誤 –