1
Spacy包含noun_chunks
功能來檢索一組名詞短語。 功能english_noun_chunks
(附後)使用word.pos == NOUN
Spacy NLP - 使用正則表達式分塊
def english_noun_chunks(doc):
labels = ['nsubj', 'dobj', 'nsubjpass', 'pcomp', 'pobj',
'attr', 'root']
np_deps = [doc.vocab.strings[label] for label in labels]
conj = doc.vocab.strings['conj']
np_label = doc.vocab.strings['NP']
for i in range(len(doc)):
word = doc[i]
if word.pos == NOUN and word.dep in np_deps:
yield word.left_edge.i, word.i+1, np_label
elif word.pos == NOUN and word.dep == conj:
head = word.head
while head.dep == conj and head.head.i < head.i:
head = head.head
# If the head is an NP, and we're coordinated to it, we're an NP
if head.dep in np_deps:
yield word.left_edge.i, word.i+1, np_label
我想從保持一定的正則表達式的一句話讓塊。例如,我的零個或多個形容詞後面跟着一個或多個名詞。
{(<JJ>)*(<NN | NNS | NNP>)+}
有沒有可能不重寫english_noun_chunks
函數?
那麼這個函數被Cython翻譯爲C的事實呢? – Serendipity
你說得對,該文件具有'.pyx'擴展名,如果你改寫它,你將失去一些性能。但是,你是否需要重寫它,或者你可以簡單地過濾最終結果? –