2010-01-22 450 views
6

我想在less查看UTF-8文本文件/流,即使我調用它是這樣的:LESSCHARSET = UTF-8少似乎並沒有工作

cat file | LESSCHARSET=utf-8 less 

非-ASCII兼容的UTF-8字符無法正確顯示。相反,它們的十六進制值在括號中突出顯示,例如, <F4>

使用UTF-8編碼讀取vim中的相同文本沒有任何問題。所以我認爲我調用less的方式有問題。

locale輸出如下

LANG="en_US.UTF-8" 
LC_COLLATE="en_US.UTF-8" 
LC_CTYPE="en_US.UTF-8" 
LC_MESSAGES="en_US.UTF-8" 
LC_MONETARY="en_US.UTF-8" 
LC_NUMERIC="en_US.UTF-8" 
LC_TIME="en_US.UTF-8" 
LC_ALL= 

我少的版本是一個通過的XCode在OSX豹安裝:

$ less --version | sed 's/^/ /' 
less 394 
Copyright (C) 1984-2005 Mark Nudelman 

less comes with NO WARRANTY, to the extent permitted by law. 
For information about the terms of redistribution, 
see the file named README in the less distribution. 
Homepage: http://www.greenwoodsoftware.com/less 

locale -a | grep US | sed 's/^/ /'輸出以下:

en_AU.US-ASCII 
en_CA.US-ASCII 
en_GB.US-ASCII 
en_NZ.US-ASCII 
en_US 
en_US.ISO8859-1 
en_US.ISO8859-15 
en_US.US-ASCII 
en_US.UTF-8 

回答

8
  1. locale命令輸出了什麼?它是一個UTF-8語言環境嗎?

  2. 您確定您的終端已設置爲顯示UTF-8嗎? echo -e '\xe2\x82\xac'是否產生€(歐元)的標誌?

  3. 您設置的語言環境是否已安裝在系統上?是否 出現在locale -a輸出的列表中?

  4. 您使用的是less的哪個版本? (運行less --version找出。) 真的,真的舊版本甚至不支持​​。這個 不太可能是這種情況,因爲我有一個Debian「sarge」系統,其中 less版本382,如果區域設置爲 設置正確,它甚至不需要LESSCHARSET。

+0

LANG = 「的en_US.UTF-8」 LC_COLLATE = 「的en_US.UTF-8」 LC_CTYPE = 「的en_US.UTF-8」 LC_MESSAGES = 「的en_US.UTF-8」 LC_MONETARY = 「的en_US.UTF-8」 LC_NUMERIC = 「的en_US.UTF-8」 LC_TIME =「en_US.UTF-8」 LC_ALL = – dan 2010-01-23 05:23:24

+1

是的,'echo -e'\ xe2 \ x82 \ xac''確實會產生歐元符號。 – dan 2010-01-23 14:43:02

+0

感謝您試圖爲我解決這個問題。我回答了你的問題。 – dan 2010-01-23 16:03:51

5

我的猜測是你的文件不是UTF8,而是ISO8859。 (是否<F4>字符應該是'ô'?)

LANG=en_US.ISO-8859-1 xterm啓動xterm。然後驗證語言環境(locale的輸出應該與en_US.ISO-8859-1類似)。然後使用less查看文件。它顯示正確嗎?

請注意,僅使用LESSCHARSET=iso8859而不啓動新終端是不夠的。​​不太會認爲終端可以解釋iso8859,但終端可能會顯示UTF8,因爲歐元符號正確顯示。但是\ xf4不是有效的utf8字符,終端可能會顯示類似' '的內容。

+0

謝謝,這是我的問題。終端具有不同的輸出編碼。我希望有一種方法可以減少在一種編碼中讀取文件並在另一種編碼中輸出其內容(或默認值,例如'$ LANG'!)。 – 2013-02-01 16:42:47

1

試試命令file file.txt。  例如,如果輸出爲「ISO-8859英文文本」,則通過命令iconv -f ISO-8859-1 -t UTF-8 -o testfile.txt file.txt將文件的編碼從ISO-8859更改爲UTF-8。 如果less testfile.txt正確顯示,請用mv testfile.txt file.txt完成。

1

在Mac OS字符集都以大寫:

bash-4.4$ less --version 
less 458 (POSIX regular expressions) 
Copyright (C) 1984-2012 Mark Nudelman 

bash-4.4$ LESSCHARSET=cp1251 less 
invalid charset name 

bash-4.4$ LESSCHARSET=CP1251 less 
Missing filename ("less --help" for help) 

Here我發現字符集的列表:

{ "ascii",   NULL,  "8bcccbcc18b95.b" }, 
{ "utf-8",   &utf_mode, "8bcccbcc18b95.b126.bb" }, 
{ "iso8859",  NULL,  "8bcccbcc18b95.33b." }, 
{ "latin3",   NULL,  "8bcccbcc18b95.33b5.b8.b15.b4.b12.b18.b12.b." }, 
{ "arabic",   NULL,  "8bcccbcc18b95.33b.3b.7b2.13b.3b.b26.5b19.b" }, 
{ "greek",   NULL,  "8bcccbcc18b95.33b4.2b4.b3.b35.b44.b" }, 
{ "greek2005",  NULL,  "8bcccbcc18b95.33b14.b35.b44.b" }, 
{ "hebrew",   NULL,  "8bcccbcc18b95.33b.b29.32b28.2b2.b" }, 
{ "koi8-r",   NULL,  "8bcccbcc18b95.b." }, 
{ "KOI8-T",   NULL,  "8bcccbcc18b95.b8.b6.b8.b.b.5b7.3b4.b4.b3.b.b.3b." }, 
{ "georgianps",  NULL,  "8bcccbcc18b95.3b11.4b12.2b." }, 
{ "tcvn",   NULL,  "b..b...bcccbccbbb7.8b95.b48.5b." }, 
{ "TIS-620",  NULL,  "8bcccbcc18b95.b.4b.11b7.8b." }, 
{ "next",   NULL,  "8bcccbcc18b95.bb125.bb" }, 
{ "dos",   NULL,  "8bcccbcc12bc5b95.b." }, 
{ "windows-1251", NULL,  "8bcccbcc12bc5b95.b24.b." }, 
{ "windows-1252", NULL,  "8bcccbcc12bc5b95.b.b11.b.2b12.b." }, 
{ "windows-1255", NULL,  "8bcccbcc12bc5b95.b.b8.b.5b9.b.4b." }, 
{ "ebcdic",   NULL,  "5bc6bcc7bcc41b.9b7.9b5.b..8b6.10b6.b9.7b9.8b8.17b3.3b9.7b9.8b8.6b10.b.b.b." }, 
{ "IBM-1047",  NULL,  "4cbcbc3b9cbccbccbb4c6bcc5b3cbbc4bc4bccbc191.b" }, 
{ NULL, NULL, NULL } 

及其別名:

{ "UTF-8",   "utf-8" }, 
{ "ANSI_X3.4-1968", "ascii" }, 
{ "US-ASCII",  "ascii" }, 
{ "latin1",   "iso8859" }, 
{ "ISO-8859-1",  "iso8859" }, 
{ "latin9",   "iso8859" }, 
{ "ISO-8859-15", "iso8859" }, 
{ "latin2",   "iso8859" }, 
{ "ISO-8859-2",  "iso8859" }, 
{ "ISO-8859-3",  "latin3" }, 
{ "latin4",   "iso8859" }, 
{ "ISO-8859-4",  "iso8859" }, 
{ "cyrillic",  "iso8859" }, 
{ "ISO-8859-5",  "iso8859" }, 
{ "ISO-8859-6",  "arabic" }, 
{ "ISO-8859-7",  "greek" }, 
{ "IBM9005",  "greek2005" }, 
{ "ISO-8859-8",  "hebrew" }, 
{ "latin5",   "iso8859" }, 
{ "ISO-8859-9",  "iso8859" }, 
{ "latin6",   "iso8859" }, 
{ "ISO-8859-10", "iso8859" }, 
{ "latin7",   "iso8859" }, 
{ "ISO-8859-13", "iso8859" }, 
{ "latin8",   "iso8859" }, 
{ "ISO-8859-14", "iso8859" }, 
{ "latin10",  "iso8859" }, 
{ "ISO-8859-16", "iso8859" }, 
{ "IBM437",   "dos" }, 
{ "EBCDIC-US",  "ebcdic" }, 
{ "IBM1047",  "IBM-1047" }, 
{ "KOI8-R",   "koi8-r" }, 
{ "KOI8-U",   "koi8-r" }, 
{ "GEORGIAN-PS", "georgianps" }, 
{ "TCVN5712-1",  "tcvn" }, 
{ "NEXTSTEP",  "next" }, 
{ "windows",  "windows-1252" }, /* backward compatibility */ 
{ "CP1251",   "windows-1251" }, 
{ "CP1252",   "windows-1252" }, 
{ "CP1255",   "windows-1255" }, 
{ NULL, NULL }