0

我正在運行DelayedJob和Mechanize on我的Rails 3.2.13應用程序 - 雖然這兩件事可能不是這個問題的一部分。我的應用程序工作正常,直到一週前。現在,當我運行我的Rails在生產環境中3.2.13應用與我的延遲方法scrape,將會報告:由於Encoding :: UndefinedConversionError導致方法失敗:U + 03B1從UTF-8到ISO-8859-1

Jul 03 13:57:19 myapp app/worker 

.1:  (30.7ms) SELECT COUNT(*) AS count_all, priority AS priority FROM "delayed_jobs" WHERE (run_at < '2013-07-03 20:57:19.344207' and failed_at is NULL) GROUP BY priority` 

`Jul 03 13:58:01 myapp app/worker.1: [Worker(host:1d2342a-b234f-4342-bcd7-0afsji3e60dab pid:2)] Person#scrape failed with Encoding::UndefinedConversionError: U+03B1 from UTF-8 to ISO-8859-1 - 0 failed attempts` 

`Jul 03 13:58:01 myapp app/worker.1: 2013-07-03T20:58:01+0000: [Worker(host:1d30481a-cf4f-4344-bad7-0e5e0ae60cab pid:2)] Person#scrape failed with Encoding::UndefinedConversionError: U+03B1 from UTF-8 to ISO-8859-1 - 0 failed attempts` 

`Jul 03 13:58:01 myapp app/worker.1:  (3.1ms) BEGIN 
Jul 03 13:58:01 myapp app/worker.1:  (18.3ms) UPDATE "delayed_jobs" SET "last_error" = 'U+03B1 from UTF-8 to ISO-8859-1 
Jul 03 13:58:01 myapp app/worker.1: /app/vendor/bundle/ruby/1.9.1/gems/mechanize-2.7.1/lib/mechanize/util.rb:57:in `encode'' 
Jul 03 13:58:01 myapp app/worker.1: /app/vendor/bundle/ruby/1.9.1/gems/mechanize-2.7.1/lib/mechanize/util.rb:57:in `encode_to'' 
Jul 03 13:58:01 myapp app/worker.1: /app/vendor/bundle/ruby/1.9.1/gems/mechanize-2.7.1/lib/mechanize/util.rb:43:in `from_native_charset'' 
Jul 03 13:58:01 myapp app/worker.1: /app/vendor/bundle/ruby/1.9.1/gems/mechanize-2.7.1/lib/mechanize/form.rb:243:in `from_native_charset'' 

U+03B1該代碼對應於字母α(α)。但是,我從來不會在我的代碼中寫入一個alpha,因爲我沒有用它。我認爲這可能與我的Twitter Bootstrap安裝有關,但我只是卸載它,問題並沒有消失。

這是它正在處理的對象。我注意到,在我嘗試過的所有條目中,在mylist屬性之前總是有一個感嘆號。我不知道爲什麼。

19:51:06 web.1 | SQL (0.8ms) INSERT INTO "delayed_jobs" ("attempts", "created_at", "failed_at", "handler", "last_error", "locked_at", "locked_by", "priority", "queue", "run_at", "updated_at") VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [["attempts", 0], ["created_at", Wed, 03 Jul 2013 23:51:06 UTC +00:00], ["failed_at", nil], ["handler", "--- !ruby/object:Delayed::PerformableMethod\nobject: !ruby/ActiveRecord:Doilist\n attributes:\n id: 188\n mylist: ! \"VWQ.U9JF.45.4595.pdf\\r\\nVWQ.U9JF.45.4595.xml\\r\\n \\r\\nVWQ.U9JF.46.1558.pdf\\r\\nVWQ.U9JF.46.1558.xml\\r\\n\n \\r\\nVWQ.U9JF.421234.pdf\\r\\nVWQ.U9JF.461764.xml\\r\\n \\r\\nVWQ.U9JF.434147.pdf\"\n created_at: 2013-07-03 23:51:06.694626000 Z\n updated_at: 2013-07-03 23:51:06.694626000 Z\n myuserid: [email protected]\n mypass: mypassword\n mymonth: '7'\n mydate: '1'\n myyear: '1'\nmethod_name: :scrape\nargs: []\n"], ["last_error", nil], ["locked_at", nil], ["locked_by", nil], ["priority", 0], ["queue", nil], ["run_at", Wed, 03 Jul 2013 23:51:06 UTC +00:00], ["updated_at", Wed, 03 Jul 2013 23:51:06 UTC +00:00]]

我很高興應要求發佈任何文件,我猜這是一個複雜的問題。非常感謝您的意見。

+1

請提供一些'Person#scrape'方法的代碼。它颳了什麼頁?可能是'U + 03B1'字符來自頁面。如果您在延遲工作之外嘗試,是否會發生同樣的錯誤? – Domon

+0

「U + 03B1」角色可能來自網頁 - 這是一個好主意 - 我甚至沒有考慮過。我將進一步研究這個問題,並儘快與您聯繫。 – CodeBiker

回答

0

我修正了這一點得益於Domon的建議。 Mechanize與之交互的網頁採用ISO-8859-1格式(詳細瞭解如何檢測here),而我的系統試圖以UTF-8格式讀取頁面。爲了解決這個問題,我輸入agent.page.encoding = 'utf-8'到我的scrape方法的腳本中,如Niels Kristian的回答here所示。 (另請參閱denis.peplin的答案,以便進一步說明在哪裏編寫它。)這允許將網頁強制轉換爲我的系統讀取它所需的正確格式(UTF-8)。

+0

出於某種原因,我停止了這個錯誤,但我開始得到一個新的錯誤:'ArgumentError:UTF-8中的無效字節序列'。在進一步檢查日誌時,我看到它在嘗試對我的對象的某個屬性(具有文本數據類型)執行'gsub'方法後立即拋出此錯誤。然而,我通過添加代碼「.encode('UTF-16le',:invalid =>:replace,:replace =>'').encode('UTF-8')'來解決它(經過大量研究)屬性在我啓動'gsub'方法之前。 – CodeBiker

+0

請參閱Van Der Hoorn的回覆[here](http://stackoverflow.com/questions/8710444/is-there-a-way-in-ruby-1-9-to-remove-invalid-byte-sequences-from-字符串)以獲取更多信息。 – CodeBiker

相關問題