2017-07-25 41 views
0

我試圖使用tahake轉述庫中的標點符號,並按照語法我需要標記的句子的話,我部分地使用下面的代碼沒有:NLTK不能標記在python2程序

#!/usr/bin/python 
# -*- coding: utf-8 -*- 
import re 
import nltk 
from nltk.tag import pos_tag 

text = '''The wife of a former U.S. president Bill Clinton Hillary Clinton visited China last Monday. Hillary Clinton wanted to visit China last month But postponed her plans till Monday last week. Hillary Clinton paid a visit to the People Republic of China on Monday. Last week the Secretary of State Ms Clinton visited Chinese officials.''' 

sentences = re.split(r'(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?)\s', text) 
text = [] 
for sentence in sentences:  
    posTagges = pos_tag(nltk.word_tokenize(sentence)) 
    text = text + [" ".join([k + '/' + v for k,v in posTagges])] 
print text 

而且我得到了以下的輸出:

['的/ DT妻子/ NN的/在/ DT前/ JJ美國/ NNP總裁/ NN比爾/ NNP 克林頓/ NNP希拉里/ NNP克林頓/ NNP visited/VBD China/NNP last/JJ 星期一/ NNP ./。','希拉里/ NNP克林頓/ NNP希望d/VBD to/TO visit/VB China/NNP last/JJ month/NN但是/ CC推遲/ VBD她/ PRP $計劃/ NNS 直到/ VBP星期一/ NNP末日/ JJ星期/ NN ./。', 'Hillary/JJ Clinton/NNP 已付/ VBD a/DT訪問/ NN至/至/ DT人/ NNP共和國/ NN IN/IN 中國/ NNP on/IN星期一/ NNP ./'','Last/JJ周/ NN的/ DT祕書/ NNP 的/在狀態/ NNP MS/NNP克林頓/ NNP訪問/ VBD中國/ JJ 官員/ NNS ./。']

現在的問題是什麼,我面臨的是在標註標點符號.或其他。我看到的是./.,而我需要./PUNCT

請幫助我,與想法。

回答

1

使用string.punctuation

In [150]: string.punctuation 
Out[150]: '!"#$%&\'()*+,-./:;<=>[email protected][\\]^_`{|}~' 

[" ".join([k + '/PUCNT' if k in string.punctuation else k + '/' + v for k,v in posTagges])] 
+0

它的工作原理...謝謝。 –