Whoosh NestedChildren搜索不返回所有結果

我正在製作一個搜索索引，它必須支持嵌套的數據層次結構。出於測試目的，我正在做一個很簡單的模式：Whoosh NestedChildren搜索不返回所有結果

test_schema = Schema(
    name_ngrams=NGRAMWORDS(minsize=4, field_boost=1.2), 
    name=TEXT(stored=True), 
    id=ID(unique=True, stored=True), 
    type=TEXT 
)

測試數據我使用這些：

test_data = [ 
    dict(
     name=u'The Dark Knight Returns', 
     id=u'chapter_1', 
     type=u'chapter'), 
    dict(
     name=u'The Dark Knight Triumphant', 
     id=u'chapter_2', 
     type=u'chapter'), 
    dict(
     name=u'Hunt The Dark Knight', 
     id=u'chapter_3', 
     type=u'chapter'), 
    dict(
     name=u'The Dark Knight Falls', 
     id=u'chapter_4', 
     type=u'chapter') 
] 

parent = dict(
    name=u'The Dark Knight Returns', 
    id=u'book_1', 
    type=u'book')

我已經添加到索引所有（5）文件，這樣

with index_writer.group(): 
    index_writer.add_document(
     name_ngrams=parent['name'], 
     name=parent['name'], 
     id=parent['id'], 
     type=parent['type'] 
    ) 
    for data in test_data: 
     index_writer.add_document(
      name_ngrams=data['name'], 
      name=data['name'], 
      id=data['id'], 
      type=data['type'] 
     )

因此，要獲得一本書所有的章節，我做它採用了NestedChildren搜索功能：

def search_childs(query_string): 
    os.chdir(settings.SEARCH_INDEX_PATH) 
    # Initialize index 
    index = open_dir(settings.SEARCH_INDEX_NAME, indexname='test') 
    parser = qparser.MultifieldParser(
     ['name', 
     'type'], 
     schema=index.schema) 
    parser.add_plugin(qparser.FuzzyTermPlugin()) 
    parser.add_plugin(DateParserPlugin()) 

    myquery = parser.parse(query_string) 

    # First, we need a query that matches all the documents in the "parent" 
    # level we want of the hierarchy 
    all_parents = And([parser.parse(query_string), Term('type', 'book')]) 

    # Then, we need a query that matches the children we want to find 
    wanted_kids = And([parser.parse(query_string), 
         Term('type', 'chapter')]) 
    q = NestedChildren(all_parents, wanted_kids) 
    print q 

    with index.searcher() as searcher: 
     #these results are the parents 
     results = searcher.search(q) 
     print "number of results:", len(results) 
     if len(results): 
      for result in results: 
       print(result.highlights('name')) 
       print(result) 
      return results

但是對於我的測試數據，如果我搜索「暗knigth」，我只獲得了3個結果時，它必須是4個的搜索結果。

我不知道，如果丟失的結果，排除具有相同名稱的書，但它只是不會出現在搜索顯示結果

我知道，所有的項目都在索引中，但我不知道我在這裏錯過了什麼。

有什麼想法？

來源

2014-05-05 Hector Armando Vela Santos

原來我使用的是NestedChildren錯誤。這裏是我的答案馬特Chaput獲得在谷歌網上論壇：

我正在做哪些必須支持數據的嵌套層次搜索索引。

NestedChildren的第二個參數不是您認爲的那樣。

TL; DR：您正在使用錯誤的查詢類型。讓我知道你想要做什麼，我可以告訴你該怎麼辦呢:)

關於嵌套子

（注意，我發現了一個bug，一眼望不到頭）

NestedChildren很難理解，但希望我可以嘗試更好地解釋它。

NestedChildren是關於尋找某些父母，但得到他們的孩子作爲命中。

第一個參數是一個匹配「父」類（例如「type：book」）的所有文檔的查詢。第二個參數是一個查詢，匹配與您的搜索條件匹配的父類的所有文檔（例如「type：book AND name：dark」）。

在你的例子中，這將意味着搜索某本書，但得到它的章節作爲搜索結果。

這本身並不是很有用，但是你可以把它和查詢結合起來，做一些複雜的查詢，比如「在我們的名字中顯示章節，用''黑''名字「：

# Find the children of books matching the book criterion 
all_parents = query.Term("type", "book") 
wanted_parents = query.Term("name", "dark") 
children_of_wanted_parents = query.NestedChildren(all_parents, wanted_parents) 

# Find the children matching the chapter criterion 
wanted_chapters = query.And([query.Term("type", "chapter"), 
          query.Term("name", "hunted")]) 

# The intersection of those two queries are the chapters we want 
complex_query = query.And([children_of_wanted_parents, 
          wanted_children])

或者，至少，這是它應該如何工作。但是我剛剛發現了一個執行NestedChildren的skip_to（）方法的錯誤，它使得上面的例子不起作用:(:(:(該錯誤現在已經固定在Bitbucket上，我將不得不做一個新的發佈版）

歡呼聲，

馬特

來源

2014-05-07 20:47:51

Whoosh NestedChildren搜索不返回所有結果

回答

相關問題