2011-04-03 99 views
11

我對Ruby很新,並且一直在嘗試一些真正基本的文本解析。然而,我現在試圖解析一些複雜的文件,然後將它推送到一個csv文件中(我以前沒有這樣做),並且陷入困境。Ruby - 解析文本文件

文件如下所示,

Title 
some text 
some different text 
Publisher: name 
Published Date: date 
Number1: number 
Number2: number 
Number3: number 
Category: category 
---------------------- 
Title 
some text 
some different text 
Publisher: name 
Published Date: date 
Number1: number 
Number2: number 
Number3: number 
Category: category 
---------------------- 

每一行都將代表CSV新的 「欄」。

請問誰能幫忙?

非常感謝!

+0

所有香港專業教育學院做了很簡單的事情是這樣的...不即便真的知道從哪裏開始與更復雜的東西:( 文件= File.new(「readfile.rb」,「R」) 而(line = file.gets) puts line end file.close – kay85 2011-04-03 08:02:24

回答

20

這裏有一個總體思路,爲您與

File.open(thefile).each do |line| 
    print line without the new line if line does not contain /--+/ 
    if line contains /--+/ 
     print line with a new line 
    end 
end 
3

開始這裏有一個完整的解決方案。 請注意,它對文件結構非常敏感!

out_file = File.open('your_csv_file.csv', 'w') 
out_file.puts "Title,Publisher,Publishedate,Number1,Number2,Number3,Category" 
the_line = [] 
in_title = false 
IO.foreach('your_file_name') do |line| 
    if line =~ /^-+$/ 
    out_file.puts the_line.join(',') 
    the_line = [] 
    elsif line =~ /^Title$/ 
    in_title = true 
    elsif line =~ /^(?:Publishe(?:r|d Date)|Number\d|Category):\s+(.*?)$/ 
    the_line += [$1] 
    in_title = false 
    elsif in_title 
    the_line[0] = (the_line.empty? ? line.chomp : "\"#{the_line[0]} #{line.chomp}\"") 
    else 
    puts "Error: don't know what to do with line #{line}" 
    end 
end 
out_file.close