符號化，分離由標點符號拆分令牌

給予相同的文字，我知道我可以用NLTK的資料Tweet標記生成令牌化它例如產生「偉大的飲料，牛肉薯餅，咖啡，玉米煎餅。」：符號化，分離由標點符號拆分令牌

['Great', 
'drinks', 
',', 
'beef', 
'hash', 
',', 
'coffee', 
',', 
'burritos', 
'.']

我要分開處理逗號和句號前的每個部分，以生成一個列表，如[Great drinks, beef hash, coffee, burritos]。我將如何做到這一點？

import re 
s= "Great drinks , beef hash, coffee, burritos." 
print (re.findall(r"[\w']+", s))

對文字 ' - '（連字符）

print (re.findall(r"([\w']+(?:\S-\S)?[\w'])+", s))

2017-04-24 13:32:04 SmartManoj

完美 - 謝謝 – user3058703

msg = "Great drinks , beef hash, coffee, burritos." 
msg.translate(str.maketrans(",.", " ")).split()

做這項工作。

2017-04-24 14:02:54 rolika

如果標點符號增加，它太長 – SmartManoj

回答