如何用Python中的ascii字符替換unicode字符（給出的perl腳本）？

我努力學習Python和無法弄清楚如何以下perl腳本轉換到Python：如何用Python中的ascii字符替換unicode字符（給出的perl腳本）？

#!/usr/bin/perl -w      

use open qw(:std :utf8); 

while(<>) { 
    s/\x{00E4}/ae/; 
    s/\x{00F6}/oe/; 
    s/\x{00FC}/ue/; 
    print; 
}

腳本只是改變unicode的變音符號替代ASCII輸出。（所以完整的輸出是ascii。）我會很感激任何提示。謝謝！

來源

2010-04-23 Frank

搜索，以便爲「音譯」，找到相關問題。 – hop 2010-04-23 18:30:50

http://stackoverflow.com/questions/816285/where-is-pythons-best-ascii-for-this-unicode-database/816319#816319 – hop 2010-04-23 19:30:09

給定的Perl腳本實際上只會替換每行上的第一個匹配項，但那肯定是一個意外。 – tripleee 2013-12-15 16:52:08

使用fileinput模塊遍歷標準輸入或文件的列表，
解碼你讀線UTF-8 Unicode對象
然後映射你的translate方法所需的任何Unicode字符

translit.py應該是這樣的：

#!/usr/bin/env python2.6 
# -*- coding: utf-8 -*- 

import fileinput 

table = { 
      0xe4: u'ae', 
      ord(u'ö'): u'oe', 
      ord(u'ü'): u'ue', 
      ord(u'ß'): None, 
     } 

for line in fileinput.input(): 
    s = line.decode('utf8') 
    print s.translate(table),

你可以使用這樣的：

$ cat utf8.txt 
sömé täßt 
sömé täßt 
sömé täßt 

$ ./translit.py utf8.txt 
soemé taet 
soemé taet 
soemé taet

更新：

如果你正在使用Python 3的字符串是默認Unicode和你不」需要對它進行編碼如果它包含非ASCII字符或甚至非拉丁字符。因此，該解決方案將看起來如下：

line = 'Verhältnismäßigkeit, Möglichkeit' 

table = { 
     ord('ä'): 'ae', 
     ord('ö'): 'oe', 
     ord('ü'): 'ue', 
     ord('ß'): 'ss', 
     } 

line.translate(table) 

>>> 'Verhaeltnismaessigkeit, Moeglichkeit'

來源

2010-04-23 19:23:45 hop

爲了得到ascii輸出的最後一行應該是print s.translate（table）.encode（'ascii'，'ignore'）'，我想。 – Frank 2010-04-23 20:00:52

嚴格來說，原始的.pl也沒有這樣做，但是，是的，這將是一個解決方案 – hop 2010-04-23 23:31:43

這個目標似乎是對德語文本進行去語言化，使其可以理解。 'ord（u'ß'）：None'在這個代碼的作用是**刪除**（「eszett」）字符。它應該是'ord（u'ß'）：u'ss''。 Upvotes？接受的答案??? – 2010-04-23 23:50:00

用於轉換爲ASCII你可能想嘗試或this recipe，這可以歸結爲：

>>> title = u"Klüft skräms inför på fédéral électoral große" 
>>> import unicodedata 
>>> unicodedata.normalize('NFKD', title).encode('ascii','ignore') 
'Kluft skrams infor pa federal electoral groe'

來源

2010-04-23 20:50:33

它根本不具備原始.pl的功能（主要是正確譯音德語特殊字符） – hop 2010-04-23 23:33:07

是的，但它實際上正是我所需要的現在！ – 2011-07-04 05:01:55

große - > groe？！？ – 2013-02-04 07:11:55

我用translitcodec

>>> import translitcodec 
>>> print '\xe4'.decode('latin-1') 
ä 
>>> print '\xe4'.decode('latin-1').encode('translit/long').encode('ascii') 
ae 
>>> print '\xe4'.decode('latin-1').encode('translit/short').encode('ascii') 
a

您可以根據需要更改解碼語言。您可能需要一個簡單的函數來減少單個實現的長度。

def fancy2ascii(s): 
    return s.decode('latin-1').encode('translit/long').encode('ascii')

來源

2013-12-15 16:42:14

你可以嘗試unidecode到Unicode轉換成ASCII碼，而不是寫手冊正則表達式。這是Text::Unidecode的Perl模塊的一個Python端口：

#!/usr/bin/env python 
import fileinput 
import locale 
from contextlib import closing 
from unidecode import unidecode # $ pip install unidecode 

def toascii(files=None, encoding=None, bufsize=-1): 
    if encoding is None: 
     encoding = locale.getpreferredencoding(False) 
    with closing(fileinput.FileInput(files=files, bufsize=bufsize)) as file: 
     for line in file: 
      print unidecode(line.decode(encoding)), 

if __name__ == "__main__": 
    import sys 
    toascii(encoding=sys.argv.pop(1) if len(sys.argv) > 1 else None)

它使用FileInput類，以避免全球性的狀態。

例子：

$ echo 'äöüß' | python toascii.py utf-8 
aouss

來源

2014-05-06 19:06:34 jfs

如何用Python中的ascii字符替換unicode字符（給出的perl腳本）？

回答

相關問題