2017-05-31 94 views
0

我想刪除僅包含包含推文的文字的數據中的表情符號。每條線對應一條推文。 「:)」出現錯誤的字符錯誤。從推文中刪除表情符號(不是表情符號!)字符串

error: bad character range :-) at position 4 

出了什麼問題?

#remove emoticons 
import re 
emoji_pattern = re.compile("[" 
     u":)" 
     u":-)" 
     u":D" 
     u":(" 
     u":-(" 
     "]+", flags=re.UNICODE) 
with open('C:/Users/M/PycharmProjects/Bachelor_Thesis/test/data_sentiment.csv',"r", encoding="utf-8") as oldfile1, open('C:/Users/M/PycharmProjects/Bachelor_Thesis/test/data_sentiment_stripped_emoticons.csv', 'w',encoding="utf-8") as newfile1: 
    for line in oldfile1: 
     line=emoji_pattern.sub(r'', line) 
     newfile1.write(line) 
newfile1.close() 
+2

您正則表達式有一些嚴重的問題 –

+1

請在你的問題真正的錯誤消息。有許多事情可以成爲問題;目前還不清楚你問的是哪一個。 –

+0

@AmeyYadav:如何解決它? –

回答

0

壞字符其實就是上一行非ASCII字符。如果你想使用這些,你需要聲明一個兼容的編碼。搜索「Python字符編碼」爲您有多種選擇。

0

我解決這樣說:

#remove emoticons 
with open('C:/Users/M/PycharmProjects/Bachelor_Thesis/test/data_sentiment.csv',"r", encoding="utf-8") as oldfile1, open('C:/Users/M/PycharmProjects/Bachelor_Thesis/test/data_sentiment_stripped_emoticons.csv', 'w',encoding="utf-8") as newfile1: 
    for line in oldfile1: 
     line=line.replace("","").replace(':)', '').replace(':D', '').replace(":(","").replace(":-(","") 
     newfile1.write(line) 
newfile1.close() 
相關問題