0
我正在使用Rails 4.2.7。我下載並從Web編寫PDF內容,像這樣......在Ruby中,如何處理PDF內容中的非UTF 8字符?
res1 = Net::HTTP.SOCKSProxy('127.0.0.1', 50001).start(uri.host, uri.port) do |http|
puts "launching #{uri}"
resp = http.get(uri)
status = resp.code
content = resp.body
content_type = resp['content-type']
content_encoding = resp['content-encoding']
end
…
if content_type == 'application/pdf' || content_type.include?('application/x-javascript')
File.open(file_location, "w") { |file| file.write content }
我注意到,對於一些內容,我得到下面的錯誤
Error during processing: "\xC2" from ASCII-8BIT to UTF-8
/Users/davea/Documents/workspace/myproject/app/services/onlinerr_service.rb:8:in `write'
/Users/davea/Documents/workspace/myproject/app/services/onlinerr_service.rb:8:in `block in pre_process_data'
/Users/davea/Documents/workspace/myproject/app/services/onlinerr_service.rb:8:in `open'
/Users/davea/Documents/workspace/myproject/app/services/onlinerr_service.rb:8:in `pre_process_data'
/Users/davea/Documents/workspace/myproject/app/services/abstract_import_service.rb:76:in `process_race_data'
/Users/davea/Documents/workspace/myproject/app/services/onlinerr_race_finder_service.rb:75:in `process_race_link'
/Users/davea/Documents/workspace/myproject/app/services/abstract_race_finder_service.rb:29:in `block in process_data'
/Users/davea/Documents/workspace/myproject/app/services/abstract_race_finder_service.rb:28:in `each'
/Users/davea/Documents/workspace/myproject/app/services/abstract_race_finder_service.rb:28:in `process_data'
/Users/davea/Documents/workspace/myproject/app/services/run_crawlers_service.rb:18:in `block in run_all_crawlers'
/Users/davea/.rvm/gems/ruby-2.3.0/gems/activerecord-4.2.7.1/lib/active_record/relation/delegation.rb:46:in `each'
我想佔它,由替換無效字符,像這樣......
File.open(file_location, "w") { |file| file.write content }
content.encode('UTF-8', :invalid => :replace, :undef => :replace)
但後來我得到的錯誤
error: PDF malformed, expected 'endstream' but found 0 instead
試圖讀取PDF文件時。有誰知道更好的方式來處理下載的PDF文件,不會破壞它們?