我失去我的腦海裏對這個對於現在看兩天......Sphinx搜索:字符集表的困難
我想在獅身人面像搜索中使用字母斯洛文尼亞語,英語所有的人+ CžS(以防萬一C)
我一直在尋找所有過網來獲取正確的字符,但我發現蹲下......
,所以我決定讓我自己一步一步...
這是我的索引
index classifieds
{
source = classifieds_src
path = c:\Sphinx\data\classifieds
docinfo = extern
min_infix_len = 2
infix_fields = title,keywords,summary,text
expand_keywords = 1
enable_star = 1
charset_type = utf-8
charset_table = 0..9, a..z, _, A..Z->a..z,-, U+002C, \
U+010C->U+010D, U+0106->U+0107, U+0160->U+0161, U+017D->U+017E, \
U+010D->c,U+0107->c, U+0161->s, U+017E->z, \
U+010D, U+0107, U+0161, U+017E
}
其中I映射大C,(C S)Z到他們的小寫對應,並加入映射從 č到C,C到C,S爲s和z割成Z 最後我加入這四個字符表....
這些都是我的公告標題:
T1:HP USBoptičnamiškaZA prenosnik RH304 T2:ČiškaPCplus MO-U033 + F2(optična,brezžična,PS/2) T3:miška LogitechoptičnaNano M235 siva
db encodi NG:utf8_general_ci 表的編碼:utf8_general_ci 標題字段編碼:utf8_general_ci
測試用例:
$testcase = array(
"miška",
"mi*ka",
"Čiška",
"čiška",
"miska",
"usb prenosnik",
"prenosnik miska",
"miška usb"
);
//api settings:
$this->sphinx->SetArrayResult(true);
$this->sphinx->setLimits(0, 100);
$this->sphinx->setMatchMode(SPH_MATCH_EXTENDED2);
$this->sphinx->SetSortMode(SPH_SORT_RELEVANCE, '@weight DESC');
$this->sphinx->SetRankingMode(SPH_RANK_PROXIMITY_BM25);
$this->sphinx->SetFieldWeights(array("title"=>100, "keywords"=>80, "summary"=>60,
"text"=>20, "slug"=>10));
和最後測試結果:
關鍵字(總/ total_found) 詞語
miška (0/0)
Array
(
[*miška*] => Array
(
[docs] => 0
[hits] => 0
)
[miška] => Array
(
[docs] => 0
[hits] => 0
)
)
mi*ka (0/0)
Array
(
[*mi*] => Array
(
[docs] => 3
[hits] => 4
)
[mi] => Array
(
[docs] => 1
[hits] => 1
)
[*2aka*] => Array
(
[docs] => 0
[hits] => 0
)
[2aka] => Array
(
[docs] => 0
[hits] => 0
)
)
Čiška (0/0)
Array
(
[*čiška*] => Array
(
[docs] => 0
[hits] => 0
)
[čiška] => Array
(
[docs] => 0
[hits] => 0
)
)
čiška (0/0)
Array
(
[*čiška*] => Array
(
[docs] => 0
[hits] => 0
)
[čiška] => Array
(
[docs] => 0
[hits] => 0
)
)
miska (0/0)
Array
(
[*miska*] => Array
(
[docs] => 0
[hits] => 0
)
[miska] => Array
(
[docs] => 0
[hits] => 0
)
)
usb prenosnik (1/1)
Array
(
[*usb*] => Array
(
[docs] => 1
[hits] => 1
)
[usb] => Array
(
[docs] => 1
[hits] => 1
)
[*prenosnik*] => Array
(
[docs] => 1
[hits] => 1
)
[prenosnik] => Array
(
[docs] => 1
[hits] => 1
)
)
prenosnik miska (0/0)
Array
(
[*prenosnik*] => Array
(
[docs] => 1
[hits] => 1
)
[prenosnik] => Array
(
[docs] => 1
[hits] => 1
)
[*miska*] => Array
(
[docs] => 0
[hits] => 0
)
[miska] => Array
(
[docs] => 0
[hits] => 0
)
)
miška usb (0/0)
Array
(
[*miška*] => Array
(
[docs] => 0
[hits] => 0
)
[miška] => Array
(
[docs] => 0
[hits] => 0
)
[*usb*] => Array
(
[docs] => 1
[hits] => 1
)
[usb] => Array
(
[docs] => 1
[hits] => 1
)
)
你可以清楚地看到,只有在queri中我才能得到積極的結果斯洛文尼亞沒有特殊字符
請ES,請幫助我失去我的腦海裏對這個
是的,我做的..沒有差異 –
OMG!我做的! [發現這裏答案] [1] [1]:http://ryaneby.com/2009/11/21/unicode-and-sphinx.html 我需要添加 sql_query_pre = SET CHARACTER_SET_RESULTS = UTF8 sql_query_pre = SET NAMES UTF8 到我的源定義......顯然DB沒有被默認連接槽UTF8! WOOO HOOOO –
我會的,但它不會讓我:S 100的聲譽需要...... 請自行張貼,我; 11確認 –