我想在逗號和/或句點中分隔nltk中的字符串。我已經嘗試過sent_tokenize()
,但它僅在時段分開。如何在逗號或句點中分割字符串nltk
我也試過這個代碼
from nltk.tokenize.punkt import PunktSentenceTokenizer, PunktLanguageVars
ex_sent = "This is an example showing sentence filtration.This is how it is done, in case of Python I want to learn more. So, that i can have some experience over it, by it I mean python."
class CommaPoint(PunktLanguageVars):
sent_end_chars = ('.','?','!',',')
tokenizer = PunktSentenceTokenizer(lang_vars = CommaPoint())
n_w=tokenizer.tokenize(ex_sent)
print n_w
上面的代碼的輸出是
['This is an example showing sentence filtration.This is how it is done,' 'in case of Python I want to learn more.' 'So,' 'that i can have some experience over it,' 'by it I mean python.\n']
當我試圖給 ''沒有任何空間它是把它作爲一個詞
我想要的輸出
['This is an example showing sentence filtration.' 'This is how it is done,' 'in case of Python I want to learn more.' 'So,' 'that i can have some experience over it,' 'by it I mean python.']
你能更具體地瞭解你的問題嗎?給出一些輸入和期望輸出的例子,並試圖說出你所嘗試過的。看看http://stackoverflow.com/help/how-to-ask – alvas
嗨,這是我第一次在stackoverflow。我試圖解釋我的問題,希望你能回答我。謝謝 –
嗨阿爾瓦斯,我希望你能幫助我這次.. –