2017-01-09 33 views
0

我正在嘗試使用斯坦福NER提取百分比。但它不能正確提取百分比。斯坦福NER未正確提取百分比

inp_str = 'total revenue received was one hundred and twenty five percent 125% for last financial year' 
split_inp_str = inp_str.split() 
st = StanfordNERTagger('english.muc.7class.distsim.crf.ser.gz') 
print(st.tag(split_inp_str)) 

這讓下面的輸出

[('total', 'O'), ('revenue', 'O'), ('received', 'O'), ('was', 'O'), ('one', 'O'), ('hundred', 'O'), ('and', 'O'), ('twenty', 'O'), ('five', 'PERCENT'), ('percent', 'PERCENT'), ('125%', 'O'), ('for', 'O'), ('last', 'O'), ('financial', 'O'), ('year', 'O')] 

爲什麼不提取125%125%的

+0

當我使用Stanford CoreNLP 3.7.0時,「PERCENT」爲「125%125%」。我正在運行Java代碼。如果您使用NLTK,我不完全確定正在運行的是什麼。 – StanfordNLPHelp

回答

-1

您需要標記句子而不是split()。嘗試下面的代碼。

from nltk import word_tokenize 

inp_str = 'total revenue received was one hundred and twenty five percent 125% for last financial year' 
split_inp_str = word_tokenize(inp_str) 
st = StanfordNERTagger('english.muc.7class.distsim.crf.ser.gz') 
print(st.tag(split_inp_str))