給定一個帶有混合編碼的文件(例如utf-8和latin-1),如何配置Emacs在保存文件時將其所有符號「投影」爲單個編碼(例如utf-8)?如何使用混合編碼更正文件?
我做了以下功能來自動化一些清理,但我想我可以找到某處將信息映射到一個編碼中的符號「é」到utf-8某處的「é」以改進函數(或者有人已經寫過這樣的函數)。
(defun jyby/cleanToUTF()
"Cleaning to UTF"
(interactive)
(progn
(save-excursion (replace-regexp "अ" ""))
(save-excursion (replace-regexp "आ" ""))
(save-excursion (replace-regexp "ॆ" ""))
)
)
(global-unset-key [f11])
(global-set-key [f11] 'jyby/cleanToUTF)
我有許多文件「損壞」混合編碼(由於從瀏覽器與病人字體配置複製粘貼),生成下面的錯誤。我可以通過搜索和替換每個有問題的符號來手工清理它們,方法是使用「」或適當的字符,或者更快速地指定「utf-8-unix」作爲編碼(下次編輯和保存時會提示相同的消息文件)。它已經成爲一個問題,因爲在任何這樣的損壞的文件中,任何加重字符被在每次保存時大小加倍大小的序列取代,最終使文件大小加倍。我使用GNU Emacs的24.2.1
These default coding systems were tried to encode text
in the buffer `test_accents.org':
(utf-8-unix (30 . 4194182) (33 . 4194182) (34 . 4194182) (37
. 4194182) (40 . 4194181) (41 . 4194182) (42 . 4194182) (45
. 4194182) (48 . 4194182) (49 . 4194182) (52 . 4194182))
However, each of them encountered characters it couldn't encode:
utf-8-unix cannot encode these: ...
Click on a character (or switch to this window by `C-x o'
and select the characters by RET) to jump to the place it appears,
where `C-u C-x =' will give information about it.
Select one of the safe coding systems listed below,
or cancel the writing with C-g and edit the buffer
to remove or modify the problematic characters,
or specify any other coding system (and risk losing
the problematic characters).
raw-text emacs-mule no-conversion
但是有沒有辦法自動轉換它?目前,我手動選擇每個違規字符,並執行搜索並替換以在整個文檔中刪除它。我打算寫一個lisp函數來自動化這個,但我不知道如何自動化出錯的字符列表(另外我希望能做更聰明的事情,比如é - > e,或者更聰明的東西來突出強調的特徵在UTF-8 ...) – Jeremy