我創建一個倒排索引將文字的字典了行號的相關列表出現在字(開始行號,並出現在單詞列表該行內的給定單元格)。創造VBA
我設法得到了一些代碼,這方面的工作,但我發現處理(在字典中的值)添加到陣列是一個有點麻煩,我不知道是有一個更有效或更優雅的方式來處理這個(事情。
我願意使用數組,集合或能夠很容易地搜索到行號的列表存儲在所述字典的值的任何其它數據類型。我已經貼了我的代碼來證明下面的核心問題砍下版本,真正的問題是差不多的BuildInvertedIndex
程序,但包括儘量使其更容易重現場景中的其餘部分:
Sub Test()
' minimum included here to demonstrate use of buildInvertedIndex procedure
Dim vRange As Range
Dim vDict As Dictionary
Set vRange = ActiveSheet.Range("F2:F20585")
Set vDict = New Dictionary
BuildInvertedIndex vDict, vRange
' test values returned in dictionary (word: [line 1, ..., line n])
Dim k As Variant, vCounter As Long
vCounter = 0
For Each k In vDict.Keys
Debug.Print k & ": " & ArrayToString(vDict.Item(k))
vCounter = vCounter + 1
If vCounter >= 10 Then
Exit For
End If
Next
End Sub
Sub BuildInvertedIndex(pDict As Dictionary, pRange As Range)
Dim cell As Range
Dim words As Variant, word As Variant, val As Variant
Dim tmpArr() As Long
Dim newLen As Long, i As Long
' loop through cells (one col wide so same as looping through lines)
For Each cell In pRange.Cells
' loop through words in line
words = Split(cell.Value)
For Each word In words
If Not pDict.exists(word) Then
' start line array with first row number
pDict.Add word, Array(cell.Row())
Else
i = 0
If Not InArray(cell.Row(), pDict.Item(word)) Then
newLen = UBound(pDict.Item(word)) + 1
ReDim tmpArr(newLen)
For Each val In tmpArr
If i < newLen Then
tmpArr(i) = pDict.Item(word)(i)
Else
tmpArr(i) = cell.Row()
End If
i = i + 1
Next val
pDict.Item(word) = tmpArr
End If
End If
Next word
Next cell
End Sub
Function ArrayToString(vArray As Variant, _
Optional vDelim As String = ",") As String
' only included to support test (be able to see what is in the arrays)
Dim vDelimString As String
Dim i As Long
For i = LBound(vArray) To UBound(vArray)
vDelimString = vDelimString & CStr(vArray(i)) & _
IIf(vCounter < UBound(vArray), vDelim, "")
Next
ArrayToString = vDelimString
End Function
要運行此操作,需要活動工作表(語句)F列中的值,如果您尚未擁有它,則還需要在VBA環境中添加對Microsoft腳本運行時的引用,以使字典數據類型可用(工具 - >參考 - > Microsoft腳本運行時)。
如你將從此變得有點雜亂,我必須插入新的行號到現有的陣列(即存儲爲字典內的值)的代碼看到。因爲我不知道如何擴展這個數組(不清除現有值),所以我使用了變量tmpArr來創建一個合適大小的數組,然後從字典中的現有數組中逐個拷貝這些值然後將當前行號添加到最後。臨時數組然後用於替換該鍵(當前單詞)的現有值。
任何意見,將不勝感激。
不能直接使用存儲在字典中的數組 - 通常的方法是將其從字典中提取出來,修改它,然後將其重新存儲在同一個插槽中。 EG:http://stackoverflow.com/questions/16447088/adding-to-an-array-in-vba-with-strings-as-the-index/16451081#16451081 –