我在使用rubyXL構建的爬蟲中遇到問題。它正確地遍歷我的文件系統,但我收到一個(Errno::ENOENT)
錯誤。我已經檢查了所有rubyXL代碼,並且所有內容都顯示出來。我的代碼附在下面 - 任何建議?rubyXL(Errno :: ENOENT)
/Users/.../testdata.xlsx
/Users/.../moretestdata.xlsx
/Users/.../Lab 1 Data.xlsx
/Users/Dylan/.rvm/gems/ruby-1.9.3-p327/gems/rubyXL-1.2.10/lib/rubyXL/parser.rb:404:in `initialize': No such file or directory - /Users/Dylan/.../sheet6.xml (Errno::ENOENT)
from /Users/Dylan/.rvm/gems/ruby-1.9.3-p327/gems/rubyXL-1.2.10/lib/rubyXL/parser.rb:404:in `open'
from /Users/Dylan/.rvm/gems/ruby-1.9.3-p327/gems/rubyXL-1.2.10/lib/rubyXL/parser.rb:404:in `block in decompress'
from /Users/Dylan/.rvm/gems/ruby-1.9.3-p327/gems/rubyXL-1.2.10/lib/rubyXL/parser.rb:402:in `upto'
from /Users/Dylan/.rvm/gems/ruby-1.9.3-p327/gems/rubyXL-1.2.10/lib/rubyXL/parser.rb:402:in `decompress'
from /Users/Dylan/.rvm/gems/ruby-1.9.3-p327/gems/rubyXL-1.2.10/lib/rubyXL/parser.rb:47:in `parse'
from xlcrawler.rb:9:in `block in xlcrawler'
from /Users/Dylan/.rvm/rubies/ruby-1.9.3-p327/lib/ruby/1.9.1/find.rb:41:in `block in find'
from /Users/Dylan/.rvm/rubies/ruby-1.9.3-p327/lib/ruby/1.9.1/find.rb:40:in `catch'
from /Users/Dylan/.rvm/rubies/ruby-1.9.3-p327/lib/ruby/1.9.1/find.rb:40:in `find'
from xlcrawler.rb:6:in `xlcrawler'
from xlcrawler.rb:22:in `<main>'
require 'find'
require 'rubyXL'
def xlcrawler(path)
count = 0
Find.find(path) do |file| # begin iteration of each file of a specified directory
if file =~ /\b.xlsx$\b/ # check if a given file is xlsx format
puts file # ensure crawler is traversing the file system
workbook = RubyXL::Parser.parse(file).worksheets # creates an object containing all worksheets of an excel workbook
workbook.each do |worksheet| # begin iteration over each worksheet
data = worksheet.extract_data.to_s # extract data of a given worksheet - must be converted to a string in order to match a regex
if data =~ /regex/
puts file
count += 1
end
end
end
end
puts "#{count} files were found"
end
xlcrawler('/Users/')
實驗1 Data.xlsx似乎有問題,我可能冒險猜測sheet6被隱藏或保護或可能以某種方式重命名,因此不會正在由'.worksheets'處理。 – Pynner
@Pynner好點噸。該文件實際上位於我的郵件目錄中 - 只是一堆數字和圖形。但是,我沒有仔細檢查工作表,沒有任何隱藏或保護。 – Anconia