我正在爲Microsoft Office套件構建私人拼寫檢查程序。我正在對拼寫錯誤和他們的潛在修復進行字符串比較,以確定我希望包含哪些更正。VBA中的加權Damerau-Levenshtein
我看着高和低的加權 Damerau - 萊文斯坦公式字符串比較,因爲我想互換,插入,刪除和替換均具有不同的權重,而不是簡單的「1」的權重,所以我可以優先考慮對其他人的一些改正。例如,錯字「agmes」在理論上可以修正爲「遊戲」或「年齡」,因爲兩者都只需要一次編輯就可以移動到拼寫正確的單詞,但是我想讓「交換」編輯更低體重,以便「遊戲」將顯示爲首選更正。
我正在使用Excel進行分析,因此我使用的任何代碼都需要在Visual Basic for Applications(VBA)中使用。我能找到的最好的是this example,這看起來不錯,但它是用Java編寫的。我盡我所能轉換,但我遠離專家,可以使用一點幫助!
任何人都可以看看附加的代碼,並幫助我找出什麼是錯的?
謝謝!
編輯:我得到它在我自己的工作。這是VBA中的加權Damerau-Levenshtein公式。它使用Excel的內置數學函數進行一些評估。將錯字與兩種可能的更正進行比較時,首選成本爲成本最低。這是因爲兩次交換的成本必須大於刪除和插入的成本,如果以最低成本(我認爲是理想的)分配交換,這是不可能的。查看凱文的博客,如果你需要更多的信息。
Public Function WeightedDL(source As String, target As String) As Double
Dim deleteCost As Double
Dim insertCost As Double
Dim replaceCost As Double
Dim swapCost As Double
deleteCost = 1
insertCost = 1.1
replaceCost = 1.1
swapCost = 1.2
Dim i As Integer
Dim j As Integer
Dim k As Integer
If Len(source) = 0 Then
WeightedDL = Len(target) * insertCost
Exit Function
End If
If Len(target) = 0 Then
WeightedDL = Len(source) * deleteCost
Exit Function
End If
Dim table() As Double
ReDim table(Len(source), Len(target))
Dim sourceIndexByCharacter() As Variant
ReDim sourceIndexByCharacter(0 To 1, 0 To Len(source) - 1) As Variant
If Left(source, 1) <> Left(target, 1) Then
table(0, 0) = Application.Min(replaceCost, (deleteCost + insertCost))
End If
sourceIndexByCharacter(0, 0) = Left(source, 1)
sourceIndexByCharacter(1, 0) = 0
Dim deleteDistance As Double
Dim insertDistance As Double
Dim matchDistance As Double
For i = 1 To Len(source) - 1
deleteDistance = table(i - 1, 0) + deleteCost
insertDistance = ((i + 1) * deleteCost) + insertCost
If Mid(source, i + 1, 1) = Left(target, 1) Then
matchDistance = (i * deleteCost) + 0
Else
matchDistance = (i * deleteCost) + replaceCost
End If
table(i, 0) = Application.Min(Application.Min(deleteDistance, insertDistance), matchDistance)
Next
For j = 1 To Len(target) - 1
deleteDistance = table(0, j - 1) + insertCost
insertDistance = ((j + 1) * insertCost) + deleteCost
If Left(source, 1) = Mid(target, j + 1, 1) Then
matchDistance = (j * insertCost) + 0
Else
matchDistance = (j * insertCost) + replaceCost
End If
table(0, j) = Application.Min(Application.Min(deleteDistance, insertDistance), matchDistance)
Next
For i = 1 To Len(source) - 1
Dim maxSourceLetterMatchIndex As Integer
If Mid(source, i + 1, 1) = Left(target, 1) Then
maxSourceLetterMatchIndex = 0
Else
maxSourceLetterMatchIndex = -1
End If
For j = 1 To Len(target) - 1
Dim candidateSwapIndex As Integer
candidateSwapIndex = -1
For k = 0 To UBound(sourceIndexByCharacter, 2)
If sourceIndexByCharacter(0, k) = Mid(target, j + 1, 1) Then candidateSwapIndex = sourceIndexByCharacter(1, k)
Next
Dim jSwap As Integer
jSwap = maxSourceLetterMatchIndex
deleteDistance = table(i - 1, j) + deleteCost
insertDistance = table(i, j - 1) + insertCost
matchDistance = table(i - 1, j - 1)
If Mid(source, i + 1, 1) <> Mid(target, j + 1, 1) Then
matchDistance = matchDistance + replaceCost
Else
maxSourceLetterMatchIndex = j
End If
Dim swapDistance As Double
If candidateSwapIndex <> -1 And jSwap <> -1 Then
Dim iSwap As Integer
iSwap = candidateSwapIndex
Dim preSwapCost
If iSwap = 0 And jSwap = 0 Then
preSwapCost = 0
Else
preSwapCost = table(Application.Max(0, iSwap - 1), Application.Max(0, jSwap - 1))
End If
swapDistance = preSwapCost + ((i - iSwap - 1) * deleteCost) + ((j - jSwap - 1) * insertCost) + swapCost
Else
swapDistance = 500
End If
table(i, j) = Application.Min(Application.Min(Application.Min(deleteDistance, insertDistance), matchDistance), swapDistance)
Next
sourceIndexByCharacter(0, i) = Mid(source, i + 1, 1)
sourceIndexByCharacter(1, i) = i
Next
WeightedDL = table(Len(source) - 1, Len(target) - 1)
End Function
您可以發佈這樣的一個答案,以自己的qusetion,使其離開該未答覆的隊列? – JustinJDavies