第一種解決方案是使用一個字典來獲得獨特的段列表。 它隨後將被作爲分裂段之前跳過第一地址數作爲簡單:
Function RemoveDuplicates1(text As String) As String
Static dict As Object
If dict Is Nothing Then
Set dict = CreateObject("Scripting.Dictionary")
dict.CompareMode = 1 ' set the case sensitivity to All
Else
dict.RemoveAll
End If
' Get the position just after the address number
Dim c&, istart&, segment
For istart = 1 To Len(text)
c = Asc(Mid$(text, istart, 1))
If (c < 48 Or c > 57) And c <> 32 Then Exit For ' if not [0-9 ]
Next
' Split the segments and add each one of them to the dictionary. No need to keep
' a reference to each segment since the keys are returned by order of insertion.
For Each segment In Split(Mid$(text, istart), ",")
If Len(segment) Then dict(segment) = Empty
Next
' Return the address number and the segments by joining the keys
RemoveDuplicates1 = Mid$(text, 1, istart - 1) & Join(dict.keys(), ",")
End Function
第二種解決方案將是,以提取所有的段,然後搜索如果它們中的每一個存在於先前的位置是:
Function RemoveDuplicates2(text As String) As String
Dim c&, segments$, segment$, length&, ifirst&, istart&, iend&
' Get the position just after the address number
For ifirst = 1 To Len(text)
c = Asc(Mid$(text, ifirst, 1))
If (c < 48 Or c > 57) And c <> 32 Then Exit For ' if not [0-9 ]
Next
' Get the segments without the address number and add a leading/trailing comma
segments = "," & Mid$(text, ifirst) & ","
istart = 1
' iterate each segment
Do While istart < Len(segments)
' Get the next segment position
iend = InStr(istart + 1, segments, ",") - 1 And &HFFFFFF
If iend - istart Then
' Get the segment
segment = Mid$(segments, istart, iend - istart + 2)
' Rewrite the segment if not present at a previous position
If InStr(1, segments, segment, vbTextCompare) = istart Then
Mid$(segments, length + 1) = segment
length = length + Len(segment) - 1
End If
End If
istart = iend + 1
Loop
' Return the address number and the segments
RemoveDuplicates2 = Mid$(text, 1, ifirst - 1) & Mid$(segments, 2, length - 1)
End Function
和第三解決方案將是使用正則表達式來除去所有的重複鏈段:
Function RemoveDuplicates3(ByVal text As String) As String
Static re As Object
If re Is Nothing Then
Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.IgnoreCase = True
' Match any duplicated segment separated by a comma.
' The first segment is compared without the first digits.
re.Pattern = "((^\d* *|,)([^,]+)(?=,).*),\3?(?=,|$)"
End If
' Remove each matching segment
Do While re.test(text)
text = re.Replace(text, "$1")
Loop
RemoveDuplicates3 = text
End Function
這些都是對於10000次迭代(越低越好),執行時間:
input text : "123 abc,,1 abc,abc 2,ABC,abc,a,c"
output text : "123 abc,1 abc,abc 2,a,c"
RemoveDuplicates1 (dictionary) : 718 ms
RemoveDuplicates2 (text search) : 219 ms
RemoveDuplicates3 (regex) : 1469 ms
一個RegExp與反向引用將是另一個可能的選擇 – brettdj