每行讀入固定數量的管道分隔字段？

我有一堆管道分隔的文件在生成時沒有正確轉義回車符，所以我不能使用CR或換行符來分隔行。不過，我知道每個記錄都必須有7個字段。每行讀入固定數量的管道分隔字段？

Ruby 1.9中的CSV庫設置'col_sep'參數很容易將字段拆分，但'row_sep'參數不能設置，因爲我在這些字段中有換行符。

有沒有辦法使用固定數量的字段作爲行分隔符來解析管道分隔文件？

謝謝！

來源

2010-11-03 SimonMD

你可以給一個分隔字符串的例子。 – Trez 2010-11-03 02:16:54

下面是做這件事的一種方法：

構建的七個字樣本字符串，在中間串的嵌入的新行。有三行值得。

text = (["now is the\ntime for all good"] * 3).join(' ').gsub(' ', '|') 
puts text 
# >> now|is|the 
# >> time|for|all|good|now|is|the 
# >> time|for|all|good|now|is|the 
# >> time|for|all|good

過程是這樣的：

lines = [] 
chunks = text.gsub("\n", '|').split('|') 
while (chunks.any?) 
    lines << chunks.slice!(0, 7).join(' ') 
end 

puts lines 
# >> now is the time for all good 
# >> now is the time for all good 
# >> now is the time for all good

所以，這顯示了我們可以重建行。

假裝的話實際上是從管道分隔的文件欄，我們可以使代碼通過取出.join(' ')做真實的東西：

while (chunks.any?) 
    lines << chunks.slice!(0, 7) 
end 

ap lines 
# >> [ 
# >>  [0] [ 
# >>   [0] "now", 
# >>   [1] "is", 
# >>   [2] "the", 
# >>   [3] "time", 
# >>   [4] "for", 
# >>   [5] "all", 
# >>   [6] "good" 
# >>  ], 
# >>  [1] [ 
# >>   [0] "now", 
# >>   [1] "is", 
# >>   [2] "the", 
# >>   [3] "time", 
# >>   [4] "for", 
# >>   [5] "all", 
# >>   [6] "good" 
# >>  ], 
# >>  [2] [ 
# >>   [0] "now", 
# >>   [1] "is", 
# >>   [2] "the", 
# >>   [3] "time", 
# >>   [4] "for", 
# >>   [5] "all", 
# >>   [6] "good" 
# >>  ] 
# >> ]

來源

2010-11-03 05:30:03

這裏有一個想法，使用正則表達式：

#!/opt/local/bin/ruby 

fp = File.open("pipe_delim.txt") 
r1 = /.*?\|.*?\|.*?\|.*?\|.*?\|.*?\|.*?\|/m 
results = fp.gets.scan(r1) 
results.each do |result| 
    puts result 
end

此正則表達式似乎絆倒在一個領域內換行，但我敢肯定，你可以調整它才能正常工作。

來源

2010-11-03 03:59:47

只是一個想法，但cucumber測試寶石有Cucumber::Ast::Table您可以用來處理此文件的類。

Cucumber::Ast::Table.new(File.read(file))

然後我認爲這是rows方法可以用來讀出它。

來源

2010-11-03 05:45:02

嘗試使用String#split和Enumerable#each_slice：

result = [] 
text.split('|').each_slice(7) { |record| result << record }

來源

2011-03-20 05:18:25 Tom

例如說你想解析所有慈善機構在IRS txt文件是管道分隔。

假設您有一個名爲Charity的模型，它的所有字段都與您的管道分隔文件相同。

class Charity < ActiveRecord::Base 
    # http://apps.irs.gov/app/eos/forwardToPub78DownloadLayout.do 
    # http://apps.irs.gov/app/eos/forwardToPub78Download.do 
    attr_accessible :city, :country, :deductibility_status, :deductibility_status_description, :ein, :legal_name, :state 
end

你可以叫import.rake

namespace :import do 

    desc "Import Pipe Delimted IRS 5013c Data " 
    task :irs_data => :environment do 

    require 'csv' 

    txt_file_path = 'db/irs_5013cs.txt' 
    results = File.open(txt_file_path).readlines do |line| 
     line = line.split('|').each_slice(7) 
    end 

    # Order Field Notes 
    # 1 EIN Required 
    # 2 Legal Name Optional 
    # 3 City Optional 
    # 4 State Optional 
    # 5 Deductibility Status Optional 
    # 6 Country Optional - If Country is null, then Country is assumed to be United States 
    # 7 Deductibility Status Description Optional 

    results.each do |row| 
     row = row.split('|').each_slice(7).to_a.first 
     #ID,Category,Sub Category,State Standard 
     Charity.create!({ 
     :ein        => row[0], 
     :legal_name      => row[1], 
     :city        => row[2], 
     :state       => row[3], 
     :deductibility_status    => row[4], 
     :country       => row[5], 
     :deductibility_status_description => row[6] 
     }) 
    end 
    end 
end

rake任務終於可以運行此導入通過鍵入您的Rails應用程序下面的命令行

rake import:irs_data

來源

2013-01-28 01:11:07 jmontross

每行讀入固定數量的管道分隔字段？

回答

相關問題