2015-09-04 71 views
-1

我在我的流浪機上安裝了獅身人面像與CentOs 6,我試圖從雪球安裝荷蘭的libstemmer。 安裝已成功執行,但測試出錯。libstemmer獅身人面像不起作用

我已經用完全相同的數據創建了2個索引。 我的指標是:

index shop_products1 { 
 
    type = rt 
 
    dict = keywords 
 
    min_prefix_len = 3 
 
    rt_mem_limit = 2046M 
 

 
    path = /var/lib/sphinxsearch/data/shop_products2 
 

 
    morphology = libstemmer_nl, stem_en 
 
    
 
    html_strip = 1 
 
    html_index_attrs = img=alt,title; a=title; 
 

 
    preopen = 1 
 
    inplace_enable = 1 
 
    index_exact_words = 1 
 

 
    
 
    rt_field = name 
 
    rt_field = brand 
 
    rt_field = description 
 
    rt_field = specifications 
 
    rt_field = tags 
 
    rt_field = ourtags 
 
    rt_field = searchfield 
 
    rt_field = shop 
 
    rt_field = category 
 
    
 
    rt_field = color 
 
    rt_field = ourcolor 
 
    rt_field = gender 
 
    rt_field = material 
 

 
    rt_field = ean 
 
    rt_field = sku 
 

 
    rt_attr_string = ean 
 
    rt_attr_string = sku 
 
    rt_attr_float = price 
 
    rt_attr_float = discount 
 
    rt_attr_uint = shopid 
 
    rt_attr_uint = itemid 
 
    rt_attr_uint = deleted 
 
    rt_attr_uint = duplicate 
 
    rt_attr_uint = brandid 
 
    rt_attr_uint = duplicates 
 
    rt_attr_timestamp = updated_at 
 
} 
 

 
index shop_products2 { 
 
    type = rt 
 
    dict = keywords 
 
    min_prefix_len = 3 
 
    rt_mem_limit = 2046M 
 

 
    path = /var/lib/sphinxsearch/data/shop_products20 
 

 
    html_strip = 1 
 
    html_index_attrs = img=alt,title; a=title; 
 

 
    preopen = 1 
 
    inplace_enable = 1 
 
    index_exact_words = 1 
 

 
    
 
    rt_field = name 
 
    rt_field = brand 
 
    rt_field = description 
 
    rt_field = specifications 
 
    rt_field = tags 
 
    rt_field = ourtags 
 
    rt_field = searchfield 
 
    rt_field = shop 
 
    rt_field = category 
 
    
 
    rt_field = color 
 
    rt_field = ourcolor 
 
    rt_field = gender 
 
    rt_field = material 
 

 
    rt_field = ean 
 
    rt_field = sku 
 

 
    rt_attr_string = ean 
 
    rt_attr_string = sku 
 
    rt_attr_float = price 
 
    rt_attr_float = discount 
 
    rt_attr_uint = shopid 
 
    rt_attr_uint = itemid 
 
    rt_attr_uint = deleted 
 
    rt_attr_uint = duplicate 
 
    rt_attr_uint = brandid 
 
    rt_attr_uint = duplicates 
 
    rt_attr_timestamp = updated_at 
 
} 
 

 

 

 

 
searchd { 
 
\t listen = 127.0.0.1:9306:mysql41 
 
    log = /var/log/sphinxsearch/searchd.log 
 
    workers = threads 
 
    binlog_path = /var/lib/sphinxsearch/rt-binlog 
 

 
    read_timeout = 5 
 
    client_timeout = 200 
 
    max_children = 0 
 
    \t 
 
    # 2 hours 
 
    rt_flush_period = 7200 
 
    pid_file = /var/run/searchd.pid 
 
    
 
}

當我搜索例如荷蘭詞「afzuigkappen」它給予同樣的結果爲「afzuigkap」

有人可以給我一些有關如何獲得這項工作的信息? Ps。對不起我的英文不好..

回答

0

荷蘭人在雪球詞幹莖afzuigkappenafzuigkap不同:

afzuigkappen -> afzuigkapp 
afzuigkap -> afzuigkap 

所以,你應該更新詞幹算法爲了參加你的目標,文件對算法here

+0

奧克,但至少在結果中必須有差異。現在在我看來,它什麼都不做,因爲索引1與索引2完全一樣.. –

+0

他們不會給出相同的結果,因爲莖是不同的 –

0

好的,我創建了一些特定的測試。 我的索引我已經創建了:

index test1 { 
 
    type = rt 
 
    dict = keywords 
 
    min_prefix_len = 3 
 
    rt_mem_limit = 2046M 
 

 
    morphology = libstemmer_nl, stem_en 
 

 
    path = /var/lib/sphinxsearch/data/test1 
 

 
    preopen = 1 
 
    inplace_enable = 1 
 
    index_exact_words = 1 
 

 
    rt_field = name 
 
    rt_attr_uint = shopid 
 
    rt_attr_uint = itemid 
 
    
 
} 
 

 
index test2 { 
 
    type = rt 
 
    dict = keywords 
 
    min_prefix_len = 3 
 
    rt_mem_limit = 2046M 
 

 
    path = /var/lib/sphinxsearch/data/test2 
 

 
    preopen = 1 
 
    inplace_enable = 1 
 
    index_exact_words = 1 
 

 
    rt_field = name 
 
    rt_attr_uint = shopid 
 
    rt_attr_uint = itemid 
 
    
 
}

我索引與足球產品較小的數據庫,並與獅身人面像的搜索結果:http://imgur.com/n95Ue8v

正如你看到的都給出相同的輸出有53條記錄。如果我只在我的mysql中搜索:select * from tests1 WHERE name LIKE'%keeper%'我得到了360個結果。