spaCy nlp - 字符串中的標記實體

可以說我有一個字符串，並且希望標記一些實體，如人員和位置。spaCy nlp - 字符串中的標記實體

string = 'My name is John Doe, and I live in USA' 
string_tagged = 'My name is [John Doe], and I live in {USA}'

我想用{}和具有{}的位置標記人員。

我的代碼：那麼與示例串

import spacy  
nlp = spacy.load('en') 
doc = nlp(string) 
sentence = doc.text 
for ent in doc.ents: 
    if ent.label_ == 'PERSON': 
     sentence = sentence[:ent.start_char] + sentence[ent.start_char:].replace(ent.text, '[' + ent.text + ']', 1) 
    elif ent.label_ == 'GPE': 
     sentence = sentence[:ent.start_char] + sentence[ent.start_char:].replace(ent.text, '{' + ent.text + '}', 1) 

    print(sentence[:ent.start_char] + sentence[ent.start_char:])

能正常工作。但是，對於更復雜的句子，我會在某些實體周圍得到雙重的qoutes。對於這個句子。

string_bug = 'Canada, Canada, Canada, Canada, Canada, Canada'

回報>> {Canada}, {Canada}, {Canada}, {Canada}, {{Canada}}, Canada

爲什麼我分裂了一句串2是隻能更換新詞（具有較高的字符位置）的原因......我覺得這個錯誤可能是我在循環在doc.ents中，所以我得到了我的字符串的舊位置，並且每個循環都使用新的[]和{}來增加字符串。但感覺像spaCy中必須有一些更簡單的方法來處理這個問題。

編輯：帶倒車（doc.ents）解決

來源

2017-02-19 Isbister

這裏有一個輕微的修改，幫助我與您的代碼工作。

string = 'My name is John Doe, and I live in USA' 

import re 
import spacy 
nlp = spacy.load('en') 
doc = nlp(string) 
sentence = doc.text 
for ent in doc.ents: 
    if ent.label_ == 'PERSON': 
     sentence = re.sub(ent.text, '[' + ent.text + ']', sentence) 
    elif ent.label_ == 'GPE': 
     sentence = re.sub(ent.text, '{' + ent.text + '}', sentence) 
print sentence

產量：

My name is [John Doe], and I live in {USA}

來源

2017-10-12 05:53:46 mattyd2

spaCy nlp - 字符串中的標記實體

回答

相關問題