在Python中檢測非英文字符的字符串

我有一些字符串混合使用英文和非英文字母。例如：在Python中檢測非英文字符的字符串

w='_1991_اف_جي2'

如何在Python中使用正則表達式或任何其他快速方法來識別這些類型的字符串？

我不喜歡將字符串中的字母逐個與字母列表進行比較，但要一次性快速完成此操作。

來源

2014-11-23 TJ1

也許使用ascii範圍，因爲ascii os只有英文字符在0-255範圍內我相信 – jgr208 2014-11-23 01:34:32

你能告訴我如何在Python中做到這一點？ – TJ1 2014-11-23 01:36:35

@ TJ1您正在使用哪個Python版本？ – thefourtheye 2014-11-23 01:39:54

您可以檢查字符串是否只能用ASCII字符（拉丁字母+其他字符）編碼。如果它不能被編碼，則它具有來自其他字母表的字符。

請注意評論# -*- coding: ....。它應該在Python文件的頂部在那裏（否則你會收到關於編碼的一些錯誤）

# -*- coding: utf-8 -*- 
def isEnglish(s): 
    try: 
     s.encode(encoding='utf-8').decode('ascii') 
    except UnicodeDecodeError: 
     return False 
    else: 
     return True 

print isEnglish('slabiky, ale liší se podle významu') 
print isEnglish('English') 
print isEnglish('ގެ ފުރަތަމަ ދެ އަކުރު ކަ') 
print isEnglish('how about this one : 通 asfަ') 
print isEnglish('?fd4))45s&')

它將返回F, T, F, F, T

來源

2014-11-23 01:45:31

感謝您的答案。在Python 3中，你所說的工作不正常，購買我使用了你的建議，並用's.encode（'ascii'）替換's.decode（'ascii'），用UnicodeEnecodeError替換'UnicodeDecodeError'，然後有效。 – TJ1 2014-11-23 06:34:22

我的確在使用Python2來測試我的代碼。感謝您改進python3的解決方案 – 2014-11-23 06:56:15

我編輯了這個答案，可以與Python 2和3一起工作。 – 2017-07-31 12:32:49

如果用字符串（不是Unicode對象）時，您可以清理它通過翻譯與isalnum()，這是更好的，而不是拋出異常檢查：

012：

import string 

def isEnglish(s): 
    return s.translate(None, string.punctuation).isalnum() 


print isEnglish('slabiky, ale liší se podle významu') 
print isEnglish('English') 
print isEnglish('ގެ ފުރަތަމަ ދެ އަކުރު ކަ') 
print isEnglish('how about this one : 通 asfަ') 
print isEnglish('?fd4))45s&') 
print isEnglish('Текст на русском') 

> False 
> True 
> False 
> False 
> True 
> False

您也可以從字符串使用此功能過濾非ASCII字符

ascii = set(string.printable) 

def remove_non_ascii(s): 
    return filter(lambda x: x in ascii, s) 


remove_non_ascii('slabiky, ale liší se podle významu') 
> slabiky, ale li se podle vznamu

來源

2016-09-30 13:49:48

在Python中檢測非英文字符的字符串

回答

相關問題