1
我是NLP和OpenNLP庫的新手,目前我正在玩一些功能,特別是圖書館提取組織名稱的功能。如果我用一個簡單的字符串,如擴展OpenNLP組織模型的培訓
"Bill worked at Microsoft Corp., JP Morgan Chase, Monsanto and General Motors and was amazed at what went on in Congress. "
我的代碼滴出:
Detected name "Bill". Type person with probability of 0.9604452678787172
Detected name "Microsoft Corp .". Type organization with probability of 0.9976452599132802
Detected name "JP Morgan Chase". Type organization with probability of 0.9064399433766583
Detected name "Monsanto". Type organization with probability of 0.7429123227376515
Detected name "General Motors". Type organization with probability of 0.965472905375375
Detected name "Congress". Type organization with probability of 0.9940809804351413
的一切似乎罰款。但是,如果我切換到世界這樣的更多的英國視圖
"Mark worked at The University of London, HSBC, The Royal Bank of Scotland, Dyson and GlaxoSmithKline."
我得到
Detected name "Mark". Type person with probability of 0.7496973664676362
Detected name "London". Type location with probability of 0.6625435519843291
Detected name "Scotland". Type location with probability of 0.9564118675997605
Detected name "University of London". Type organization with probability of 0.8516268558212053
Detected name "Royal Bank". Type organization with probability of 0.8953174632171774
顯然不是那麼成功。這是因爲組織查找者不瞭解英國機構或者我不幸運?如果前者有辦法讓我採用現有模式,並將其知識擴展到英國機構更好?我快速查看了現有組織模型的培訓數據,但找不到任何內容。
其中存在問題。很容易獲得所有在公司內部註冊的英國公司名單,但可以用來培訓該模型嗎?我要在這裏作出新的判斷 – 2014-10-27 13:10:58
不是真的,至少不是OpenNLP。 OpenNLP希望在實體出現在句子上下文中的地方有訓練數據。請參閱文檔:https://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind.training – aab 2014-10-29 09:03:42
這幾乎是我的想法。非常感謝您的幫助 – 2014-10-29 12:13:53