從python字符串中刪除emojis

我需要使用python腳本從某些字符串中刪除表情符號。我發現已經有人問過這個question，答案的一個標記爲成功，即下面的代碼會做的伎倆：從python字符串中刪除emojis

#!/usr/bin/env python 
import re 

text = u'This dog \U0001f602' 
print(text) # with emoji 

emoji_pattern = re.compile("[" 
    u"\U0001F600-\U0001F64F" # emoticons 
    u"\U0001F300-\U0001F5FF" # symbols & pictographs 
    u"\U0001F680-\U0001F6FF" # transport & map symbols 
    u"\U0001F1E0-\U0001F1FF" # flags (iOS) 
         "]+", flags=re.UNICODE) 
print(emoji_pattern.sub(r'', text)) # no emoji

我插入這個代碼到我的劇本，並改變了它只能是在我的代碼而不是示例文本中對字符串進行操作。當我運行的代碼，但是，我得到了一些錯誤，我不明白：

Traceback (most recent call last): 
    File "SCRIPT.py", line 31, in get_tweets 
"]+", flags=re.UNICODE) 
    File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework /Versions/2.7/lib/python2.7/re.py", line 194, in compile 
    return _compile(pattern, flags) 
    File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 251, in _compile 
    raise error, v # invalid expression 
sre_constants.error: bad character range

我得到了錯誤的說法，但因爲我一把抓起Stackexchange這個代碼，我想不通爲什麼它顯然奏效對於這個討論中的人們而言，不適合我。如果有幫助，我使用Python 2.7。謝謝！

來源

2016-09-17 Kirk S.

'sys.maxunicode'說？ –

您的Python版本使用surrogate pairs來表示無法用16位表示的unicode字符 - 這是所謂的「窄版」。這意味着任何等於或高於u"\U00010000"的值都被存儲爲兩個字符。因爲即使在unicode模式下，正則表達式解析器也逐字符地工作，如果您嘗試使用該範圍內的字符，這可能會導致錯誤的行爲。

在這種特殊情況下，Python僅將表情符號字符代碼的第一個「一半」看作範圍的末尾，而「half」小於範圍的起始值，使其無效。

Python 2.7.10 (default, Jun 1 2015, 09:44:56) 
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin 
Type "help", "copyright", "credits" or "license" for more information. 
>>> import sys 
>>> sys.maxunicode 
65535 
>>> tuple(u"\U00010000") 
(u'\ud800', u'\udc00')

基本上，你需要獲得一個Python的「廣建」這個工作：

Python 3.5.2 (default, Jul 28 2016, 21:28:00) 
[GCC 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31)] on darwin 
Type "help", "copyright", "credits" or "license" for more information. 
>>> import sys 
>>> sys.maxunicode 
1114111 
>>> tuple(u"\U00010000") 
('',)

的字符顯示不正確，我在瀏覽器，但它確實表明只有一個字符，而不是兩個。

來源

2016-09-17 08:33:52 agf

從python字符串中刪除emojis

回答

相關問題