5
我們如何編寫一個輸出字符串「homoglyph equivalents」的高效函數?查找所有「字符相等」字符串的高效算法?
實施例1(僞代碼):
homoglyphs_list = [
["o", "0"], // "o" and "0" are homoglyphs
["i", "l", "1"] // "i" and "l" and "1" are homoglyphs
]
input_string = "someinput"
output = [
"someinput", "s0meinput", "somelnput",
"s0melnput", "some1nput", "s0me1nput"
]
實施例2:
homoglyphs_list = [
["rn", "m", "nn"],
]
input_string = "rnn"
output = ["rnn", "rm", "mn", "rrn", "nnn", "nm", "nrn"]
實施例3:
homoglyphs_list = [
["d", "ci", "a"], // "o" and "0" are homoglyphs
["i", "l", "1"] // "i" and "l" and "1" are homoglyphs
]
/*
notice that with the two rules above,
we can infer "d" = "ci" = "a" = "cl" = "c1"
*/
input_string = "pacerier"
output = [
"pacerier", "pacerler", "pacer1er", "pcicerier",
"pcicerler", "pcicer1er", "pclcerier", "pc1cerier",
"pclcerler", "pc1cerler", "pclcer1er", "pc1cer1er",
"pdcerier", "pdcerler", "pdcer1er"
]
注意:輸出數組中成員的順序並不重要,我們可以假設給定的同類映射映射被假定爲正確的(輸入不會給我們一個「無限循環」)。
我目前的算法有效,但它使用原始bruteforcing和性能是可怕的。例如。的"mmmmm"
有同形字["rn", "m", "nn"]
輸入需要38秒運行:
// This is php code (non-pseudo just so we could test the running time),
// but the question remains language-agnostic
public function Func($in, Array $mappings){
$out_table = array();
$out_table[$in] = null;
while(true){
$number_of_entries_so_far = count($out_table);
foreach(array_keys($out_table) as $key){
foreach($mappings as $mapping){
foreach($mapping as $value){
for($a=0, $aa=strlen($key); $a<$aa; ++$a){
$pos = strpos($key, $value, $a);
if($pos === false){
continue;
}
foreach($mapping as $equal_value){
if($value === $equal_value){
continue;
}
$out_table[substr($key, 0, $pos) . $equal_value . substr($key, $pos + strlen($value))] = null;
}
}
}
}
}
if($number_of_entries_so_far === count($out_table)){
// it means we have tried bruteforcing but added no additional entries,
// so we can quit now
break;
}
}
return array_keys($out_table);
}
我們怎樣才能實現高效的(快)同形字擴展算法?
它會做什麼,如果我像'$ homoglyph_mappings [0] = array(「n」,「nn」,「nnn」);'' – Rajan
@Rajan,如上所述,我們可以假定輸入是正確的(即,根據不正確的輸入,我們得到未定義的行爲)。你的例子是一個不正確的輸入,因爲它會給我們一個無限循環。如果'n = nn',然後'nn = nnnn',然後'nnnn = nnnnnnnn',然後'nnnnnnnn = nnnnnnnnnnnnnnn'等等,*無限* ... – Pacerier
好吧,這是一個_exceptional case_。 – Rajan