2015-11-02 101 views
0

我想在使用Hibernate Search的實體中對電子郵件地址進行全文搜索。休眠搜索不索引電子郵件地址?

考慮下面的實體 「人」 與索引字段 「電子郵件」:

Person.groovy

package com.example 

import javax.persistence.Entity 
import javax.persistence.GeneratedValue 
import javax.persistence.GenerationType 
import javax.persistence.Id 

import org.hibernate.search.annotations.Field 
import org.hibernate.search.annotations.Indexed 

@Entity 
@Indexed 
class Person { 
    @Id 
    @GeneratedValue(strategy=GenerationType.AUTO) 
    Long id 

    @Field 
    String email 
} 

並給出了庫

SearchRepository.groovy

package com.example 

import javax.persistence.EntityManager 

import org.apache.lucene.search.Query 
import org.hibernate.search.jpa.FullTextEntityManager 
import org.hibernate.search.jpa.Search 
import org.hibernate.search.query.dsl.QueryBuilder 
import org.springframework.beans.factory.annotation.Autowired 
import org.springframework.stereotype.Repository 

@Repository 
class SearchRepository { 

    @Autowired 
    EntityManager entityManager 

    FullTextEntityManager getFullTextEntityManager() { 
     Search.getFullTextEntityManager(entityManager) 
    } 

    List<Person> findPeople(String searchText){ 
     searchText = searchText.toLowerCase()+'*' 
     QueryBuilder qb = fullTextEntityManager.searchFactory 
       .buildQueryBuilder().forEntity(Person).get() 
     Query query = 
       qb 
       .keyword() 
       .wildcard() 
       .onField('email') 
       .matching(searchText) 
       .createQuery() 

     javax.persistence.Query jpaQuery = 
       fullTextEntityManager.createFullTextQuery(query, Person) 

     jpaQuery.resultList 
    } 
} 

然後下面的測試失敗:

SearchWildcardTest.groovy

package com.example 

import javax.persistence.EntityManager 

import org.hibernate.search.jpa.FullTextEntityManager 
import org.hibernate.search.jpa.Search 
import org.junit.Test 
import org.junit.runner.RunWith 
import org.springframework.beans.factory.annotation.Autowired 
import org.springframework.boot.test.SpringApplicationConfiguration 
import org.springframework.test.context.junit4.SpringJUnit4ClassRunner 
import org.springframework.transaction.annotation.Transactional 

@RunWith(SpringJUnit4ClassRunner) 
@SpringApplicationConfiguration(classes = HibernateSearchWildcardApplication) 
@Transactional 
class SearchWildcardTest { 

    @Autowired 
    SearchRepository searchRepo 

    @Autowired 
    PersonRepository personRepo 

    @Autowired 
    EntityManager em 

    FullTextEntityManager getFullTextEntityManager() { 
     Search.getFullTextEntityManager(em) 
    } 

    @Test 
    void findTeamsByNameWithWildcard() { 
     Person person = personRepo.save new Person(email: '[email protected]') 

     fullTextEntityManager.createIndexer().startAndWait() 
     fullTextEntityManager.flushToIndexes() 

     List<Person> people = searchRepo.findPeople('[email protected]') 

     assert people.contains(person) // this assertion fails! Why? 
    } 
} 

PersonRepository.groovy

package com.example 

import org.springframework.data.repository.CrudRepository 

interface PersonRepository extends CrudRepository<Person, Long>{ 
} 

的build.gradle

buildscript { 
    ext { 
     springBootVersion = '1.2.7.RELEASE' 
    } 
    repositories { 
     mavenCentral() 
    } 
    dependencies { 
     classpath("org.springframework.boot:spring-boot-gradle-plugin:${springBootVersion}") 
     classpath('io.spring.gradle:dependency-management-plugin:0.5.2.RELEASE') 
    } 
} 

apply plugin: 'groovy' 
apply plugin: 'eclipse' 
apply plugin: 'spring-boot' 
apply plugin: 'io.spring.dependency-management' 

jar { 
    baseName = 'hibernate-search-email' 
    version = '0.0.1-SNAPSHOT' 
} 
sourceCompatibility = 1.8 
targetCompatibility = 1.8 

repositories { 
    mavenCentral() 
} 

dependencies { 
    compile('org.springframework.boot:spring-boot-starter-data-jpa') 
    compile('org.codehaus.groovy:groovy') 
    compile('org.hibernate:hibernate-search:5.3.0.Final') 
    testCompile('com.h2database:h2') 
    testCompile('org.springframework.boot:spring-boot-starter-test') 
} 

task wrapper(type: Wrapper) { 
    gradleVersion = '2.8' 
} 

這裏是盧克從生成Lucene索引顯示了運行測試後:

enter image description here

在我看來,電子郵件地址「[email protected]」沒有完全存儲在索引中,而是被拆分爲兩個字符串「foo」和「bar.com」。

從官方Hibernate Search website 「入門」指南指出

[...]標記者處分割標點字符單詞和連字符,同時保持電子郵件地址和主機名互聯網完整的標準。這是一個很好的通用分詞器。 [...]

我必須在這裏失蹤,但無法弄清楚。

我的問題:

  • 爲什麼我的代碼不會索引完整的電子郵件地址?
  • 我該如何做到索引地址以便測試通過?

回答

3

似乎文檔反映了底層Lucene API中的更改不正確。

[K] eeping電子郵件地址和主機名互聯網完整...

這用來爲自那時以來已經改變對Lucene的側面傳統StandardTokenizer是正確的。它的行爲現在可以在ClassicTokenizer中找到。

所以下面的配置應該給你你所追求的:

@Entity 
@Indexed 
@AnalyzerDef(
    name = "emailanalyzer", 
    tokenizer = @TokenizerDef(factory = ClassicTokenizerFactory.class), 
    filters = { 
     @TokenFilterDef(factory = LowerCaseFilterFactory.class), 
    } 
) 
class Person { 

    // ... 

    @Field 
    @Analyzer(definition = "emailanalyzer") 
    String email; 
} 

注意微調也與此配置應用。我們將相應調整HSEARCH文檔,感謝您發現這一點!

+0

太棒了,@Gunnar!這對我很有用,非常感謝! – Riggs

+0

不錯,很高興聽到! – Gunnar