的話這是我的代碼:同義詞由Levenshtein距離
public void SearchWordSynonymsByLevenstein()
{
foreach (var eachWord in wordCounter)
{
foreach (var eachSecondWord in wordCounter)
{
if (eachWord.Key.Length > 3)
{
var score = LevenshteinDistance.Compute(eachWord.Key, eachSecondWord.Key);
if (score < 2)
{
if(!wordSynonymsByLevenstein.Any(x => x.Value.ContainsKey(eachSecondWord.Key)))
{
if (!wordSynonymsByLevenstein.ContainsKey(eachWord.Key))
{
wordSynonymsByLevenstein.Add(eachWord.Key, new Dictionary<string, int> { { eachSecondWord.Key, eachSecondWord.Value } });
}
else
{
wordSynonymsByLevenstein[eachWord.Key].Add(eachSecondWord.Key, eachSecondWord.Value);
}
}
}
}
}
}
}
我wordCounter
是Dictionary<string, int>
,其中關鍵是我的每一個字和值是計算有多少文件存在這個詞。像Bag的字。我必須從其他eachSecondWord
搜索eachWord
的同義詞。這種方法花費了太多時間。時間呈指數增長。還有其他辦法可以縮短時間嗎?
'wordSynonymsByLevenstein'確實需要一個'Dictionary>'?爲什麼不只是一個'Dictionary >'?你可以使用它來找到「同義詞」,然後到「wordCounter」的計數。 –
juharr
感謝,後來我這樣做: '如果(wordSynonymsByLevenstein.TryGetValue(eachMainWord,出isThisWord)){ \t的foreach(在isThisWord VAR eachWw) \t { \t \t mainWordWithSynonyms.Add(eachWw.Key); \t \t fullCounted = fullCounted + eachWw.Value; \t} \t var distinctedWord = mainWordWithSynonyms.DistinctBy(x => x).ToList(); (y => y == x))&& compFoundWords.Any(x => distinctedWord.Any(y => y == x))) \t { \t \t relationScore = relationScore +((double)1 /(double)fullCounted); \t \t countingEqualWord ++; \t} }''所以必須wordSynonymsByLevenshtein'是這樣'Dictionary' – Sidron
我想說的是,如果'wordSynonymsByLevenstein'是'詞典<字符串,列表',那麼你會得到'isThisWord'出來,它將單詞列表,所以改變'eachWw.Key'到'eachWw'和'eachWw.Value'到'wordCounter [eachWw]' –
juharr