鑑於輸入「快速棕色狐狸跳躍」我想創建每個可能的詞彙組合。因此,例如字符串將被標記化到Elastic tokenize into all words cominations
[
"quick", "quick brown", "quick fox", "quick jumped",
"brown", "brown quick", "brown fox", "brown jumped",
...,
"jumped quick", "jumped brown", "jumped fox", "jumped"
]
我可以用shingle tokeniser它,但它只能通過連接相鄰方面創造了新的標記和我結束了:
[
"quick", "quick brown", "quick brown fox", "quick brown fox jumped",
"brown", "brown fox", "brown fox jumped",
"fox", "fox jumped",
"jumped"
]
這是向前邁出的正確的一步但不是我尋找的東西。
你能解釋一下你使用的用例嗎? – Val
@Val長話短說 - 不僅僅是單一詞彙([「quick」,「brown」,「fox」,「jumped」)),而且還包括這些單詞/術語的組合 –