我有兩個文件,我必須迭代並計算我的命名實體標記器的精度和召回率。一個文件是黃金集,另一個是我的系統的輸出。我只想了解如何迭代兩個文件中的句子並計算完整和部分匹配的數量。我只想計算組織,人員和地點的比賽。僞代碼或只是一個想法讓我開始將工作得很好。迭代包含命名實體映射的兩個文件,並計算精度和召回率
文件1:黃金集合
Sentence 1:
{ORGANIZATION=[Fulton County Grand Jury]}
Sentence 2:
{ORGANIZATION=[City Executive Committee]}
{LOCATION=[City of Atlanta]}
Sentence 3:
{LOCATION=[Fulton]}
{PERSON=[Superior Court Judge Durwood Pye]}
{PERSON=[Mayor-nominate Ivan Allen Jr.]}
Sentence 4:
Sentence 5:
Sentence 6:
{LOCATION=[Fulton]}
Sentence 7:
{LOCATION=[Fulton County]}
Sentence 8:
Sentence 9:
{ORGANIZATION=[City Purchasing Department]}
Sentence 10:
Sentence 11:
Sentence 12:
{ORGANIZATION=[State Welfare Department]}
Sentence 13:
{LOCATION=[Fulton County]}
{ORGANIZATION=[State Welfare Department]}
{LOCATION=[Fulton County]}
檔案2:我的輸出
Sentence 1:
{ORGANIZATION=[Fulton County Grand Jury], DATE=[Friday], LOCATION=[Atlanta]}
Sentence 2:
{ORGANIZATION=[City Executive Committee], LOCATION=[Atlanta]}
Sentence 3:
{ORGANIZATION=[Fulton Superior Court Judge Durwood Pye], DATE=[September October], PERSON=[Ivan Allen Jr.]}
Sentence 4:
Sentence 5:
{LOCATION=[Georgia]}
Sentence 6:
Sentence 7:
{LOCATION=[Atlanta, Fulton County]}
Sentence 8:
Sentence 9:
{ORGANIZATION=[City Purchasing Department]}
Sentence 10:
{LOCATION=[Georgia]}
Sentence 11:
Sentence 12:
{ORGANIZATION=[State Welfare Department]}
Sentence 13:
{ORGANIZATION=[State Welfare Department], LOCATION=[Fulton County, Fulton County]}
我不需要迭代地圖值來提取組織值嗎?見第二個文件。我的行可能並不總是以組織密鑰開頭.. – serendipity
當前的精確匹配只會匹配具有相同字段相關值的組織 - 基本上是完全匹配的行 - 例如「{ORGANIZATION = [State Welfare Department]} ' - 但是如果你更願意與忽略DATE等的名稱匹配,那麼需要建立一個自定義邏輯。 –