2013-03-20 61 views
2

您好我正在嘗試將CS​​V數據導入空間啓用的Postgres數據庫。數據可用here。我不確定我出錯的地方,非常感謝任何幫助!我試圖做的是用D3.js將這些數據可視化,並且可能顯示每個城鎮大多數圖書館的熱密度或某種程度的東西。使用rake任務將空間CSV數據導入Postgres/PostGIS數據庫

File: lib/tasks/import_incidents_csv.rake 

require 'csv' 

namespace :import_incidents_csv do 

    task :create_incidents => :environment do 

    csv_text = File.read('/home/mgmacri/data/PublicLibraryBranchLocations.csv') 
    csv = CSV.parse(csv_text, :headers => true) 

    csv.each do |row| 
     row = row.to_hash.with_indifferent_access 
     Moulding.create!(row.to_hash.symbolize_keys) 
    end 

    end 

end 


[email protected]:/spatial_project$: rake import_incidents_csv:create_incidents --trace 
** Invoke import_incidents_csv:create_incidents (first_time) 
** Invoke environment (first_time) 
** Execute environment 
** Execute import_incidents_csv:create_incidents 
rake aborted! 
invalid byte sequence in UTF-8 
/usr/lib/ruby/1.9.1/csv.rb:1855:in `sub!' 
/usr/lib/ruby/1.9.1/csv.rb:1855:in `block in shift' 
/usr/lib/ruby/1.9.1/csv.rb:1849:in `loop' 
/usr/lib/ruby/1.9.1/csv.rb:1849:in `shift' 
/usr/lib/ruby/1.9.1/csv.rb:1791:in `each' 
/usr/lib/ruby/1.9.1/csv.rb:1805:in `to_a' 
/usr/lib/ruby/1.9.1/csv.rb:1805:in `read' 
/usr/lib/ruby/1.9.1/csv.rb:1379:in `parse' 
/home/mgmacri/rails/mymap/lib/tasks/import_incidents_csv.rake:8:in `block (2 levels) in          
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:228:in `call' 
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:228:in `block in execute' 
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:223:in `each' 
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:223:in `execute' 
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:166:in `block in   invoke_with_call_chain' 
/usr/lib/ruby/1.9.1/monitor.rb:211:in `mon_synchronize' 
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:159:in `invoke_with_call_chain' 
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:152:in `invoke' 
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:143:in `invoke_task' 
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:101:in `block (2 levels)  in top_level' 
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:101:in `each' 
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:101:in `block in top_level' 
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:110:in `run_with_threads' 
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:95:in `top_level' 
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:73:in `block in run' 
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:160:in `standard_exception_handling' 
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:70:in `run' 
/var/lib/gems/1.9.1/gems/rake-10.0.3/bin/rake:33:in `<top (required)>' 
/usr/local/bin/rake:19:in `load' 
/usr/local/bin/rake:19:in `<main>' 
Tasks: TOP => import_incidents_csv:create_incidents 

回答

2

使用PostgreSQL的本地CSV導入幾個數量級比使用Ruby的CSV API更快,並且還可以避免相同的編碼問題。

例如:

namespace :import_incidents_csv do 
    task :create_incidents => :environment do 
    ActiveRecord::Base.connection.execute "COPY moulding (name, state, postcode, lat, long) FROM '/home/mgmacri/data/PublicLibraryBranchLocations.csv' DELIMITER ',' CSV;" 
    end 
end 

更多信息:http://www.postgresql.org/docs/9.2/static/sql-copy.html

3

Excel的編碼將文件放入UTF-8ISO-8859-1不能及的。所以告訴Ruby來打開該文件只有在讀ISO-8859-1

file=File.open("input_file", "r:ISO-8859-1") 
+0

的Excel默認使用ANSI代碼,而不是一個ISO字符集。這些不是完全相同的東西。例如,ISO-8859-1與cp1252(也稱爲Windows-1252)類似但不相同。最好是使用正確的代碼頁,而不是以足夠接近的編碼來猜測 - 或者更好的是,使用OpenOffice將UTF-8保存爲Excel表格並保持您的理智。請注意,Excel將在不同的系統上使用不同的代碼頁。例如中歐用戶可能會向您發送cp1251文本。請參閱http://stackoverflow.com/questions/508558/what-c​​harset-does-microsoft-excel-use-when-saving-files – 2013-03-21 01:04:54

+1

另請參閱http://en.wikipedia.org/wiki/Windows-1252解釋除此之外,即使Windows將其稱爲「ANSI」代碼頁,Windows-1252甚至不符合ANSI標準。 – 2013-03-21 01:10:23

+0

@克雷格:感謝您的解釋。 – 2013-03-21 04:05:04