多級加入索爾

嗨，我有一個3級樹結構的數據。當用戶搜索第三層節點時，我可以使用SOlr JOIN獲取根節點。多級加入索爾

例如 -

PATIENT1 
     -> FirstName1 
     -> LastName1 
     -> DOCUMENTS1_1 
      -> document_type1_1 
      -> document_description1_1 
      -> document_value1_1 
      -> CODE_ITEMS1_1_1 
       -> Code_id1_1_1 
       -> code1_1_1 
      -> CODE_ITEMS1_1_1 
       -> Code_id1_1_2 
       -> code1_1_2 
     -> DOCUMENTS1_2 
      -> document_type1_2 
      -> document_description1_2 
      -> document_value1_2 
      -> CODE_ITEMS1_2_1 
       -> Code_id1_2_1 
       -> code1_2_1 
      -> CODE_ITEMS1_2_2 
       -> Code_id1_2_2 
       -> code1_2_2 
    PATIENT2 
     -> FirstName2 
     -> LastName2 
     -> DOCUMENTS2_1 
      -> document_type2_1 
      -> document_description2_1 
      -> document_value2_1 
      -> CODE_ITEMS2_1_1 
       -> Code_id2_1_1 
       -> code2_1_1 
      -> CODE_ITEMS2_1_2 
       -> Code_id2_1_2 
       -> code2_1_2

我要搜索一個CODE_ITEM並返回所有項目的搜索條件的代碼匹配的患者。如何才能做到這一點。是否有可能實施連接兩次。第一次加入給出了code_item搜索的所有文檔，而下一次加入給出了所有Patient。

喜歡的東西在SQL查詢 -

select * from patients where docID (select DOCID from DOCUMENTS where CODEID IN (select CODEID from CODE_ITEMS where CODE LIKE '%SEARCH_TEXT%'))

來源

2012-07-27 user1185893

我真的不知道該怎麼Solr的內部連接的工作，但我們知道，RDB多個連接是大型數據集上效率極低，我可能會寫出來我自己的org.apache.solr.handler.component.QueryComponent，這將在正常搜索後獲得根父級（當然，這種方法要求每個孩子doc都有一個對其根病人的引用）。

如果你選擇走這條路，我會發表一些例子。在我之前的一個Solr項目中，我遇到了類似的（更復雜的 - 本體論）問題。

更簡單的方法（解決此問題時更簡單，而不是整個方法）是將您的架構的這部分完全平坦化並將所有信息（文檔和代碼項）存儲到其父患者中，定期搜索。這更符合Solr（您必須以不同的方式查看Solr模式，這與您的常規RDB規範化模式無關，Solr鼓勵數據冗餘，因此您可以在沒有連接的情況下快速搜索）。

第三種方法是對代表性數據集進行一些連接測試，看看搜索性能如何受到影響。最後，它確實取決於你的整個設置和要求（當然還有測試結果）。

編輯1： 我這麼做了幾年，所以你必須弄清楚事情是否同時發生了變化。

1.創建自定義的請求處理程序

要做到徹底的清潔工作，我建議你通過簡單地複製整個部分與

<requestHandler name="/select" class="solr.SearchHandler"> ... ... </requestHandler>開始定義自己的請求處理程序（solrconfig.xml中）

然後將name更改爲對用戶有意義的內容，例如/searchPatients。

<arr name="components"> 
      <str>patients</str> 
      <str>facet</str> 
      <str>mlt</str> 
      <str>highlight</str>    
      <str>stats</str> 
      <str>debug</str> 
    </arr>

2：另外，內添加這部分。創建自定義搜索組件

添加到您的solrconfig：

<searchComponent name="patients" class="org.apache.solr.handler.component.PatientQueryComponent"/>

創建PatientQueryComponent類：
下面的源可能有錯誤，因爲我修改了原始的源文本編輯器，並張貼未經測試，但重要的是你得到食譜，而不是完成源，對嗎？我拋出緩存，延遲加載，準備方法，只留下基本邏輯。您必須瞭解性能如何受到影響，然後根據需要調整來源。我的表現還不錯，但我的索引中總共有幾百萬份文件。

public class PatientQueryComponent extends SearchComponent { 
... 

    @Override 
    public void process(ResponseBuilder rb) throws IOException { 
     SolrQueryRequest req = rb.req; 
     SolrQueryResponse rsp = rb.rsp; 
     SolrParams params = req.getParams(); 
     if (!params.getBool(COMPONENT_NAME, true)) { 
      return; 
     } 
     searcher = req.getSearcher(); 
     // -1 as flag if not set. 
     long timeAllowed = (long)params.getInt(CommonParams.TIME_ALLOWED, -1); 

     DocList initialSearchList = null; 

     SolrIndexSearcher.QueryCommand cmd = rb.getQueryCommand(); 
     cmd.setTimeAllowed(timeAllowed); 
     cmd.setSupersetMaxDoc(UNLIMITED_MAX_COUNT); 

     // fire standard query 
     SolrIndexSearcher.QueryResult result = new SolrIndexSearcher.QueryResult(); 
     searcher.search(result, cmd); 

     initialSearchList = result.getDocList(); 

     // Set which'll hold patient IDs 
     List<String> patientIds = new ArrayList<String>(); 

     DocIterator iterator = initialSearchList.iterator(); 
     int id; 

     // loop through search results 
     while(iterator.hasNext()) { 
      // add your if logic (doc type, ...) 
      id = iterator.nextDoc(); 
      doc = searcher.doc(id); // , fields) you can try lazy field loading and load only patientID filed value into the doc 
      String patientId = doc.get("patientID") // field that's in child doc and points to its root parent - patient 
      patientIds.add(patientId); 
     } 

     // All all unique patient IDs in TermsFilter 
     TermsFilter termsFilter = new TermsFilter(); 
     Term term; 

     for(String pid : patientIds){ 
      term = new Term("patient_ID", pid); // field that's unique (name) to patient and holds patientID 
      termsFilter.addTerm(term); 
     } 

     // get all patients whose ID is in TermsFilter 
     DocList patientsList = null;   
     patientsList = searcher.getDocList(new MatchAllDocsQuery(), searcher.convertFilter(termsFilter), null, 0, 1000); 

     long totalSize = initialSearchList.size() + patientsList.size(); 
     logger.info("Total: " + totalSize); 

     SolrDocumentList solrResultList = SolrPluginUtils.docListToSolrDocumentList(patientsList, searcher, null, null); 
     SolrDocumentList solrInitialList = SolrPluginUtils.docListToSolrDocumentList(initialSearchList, searcher, null, null); 

     // Add patients to the end of the list 
     for(SolrDocument parent : solrResultList){ 
      solrInitialList.add(parent); 
     } 

     // replace initial results in response 
     SolrPluginUtils.addOrReplaceResults(rsp, solrInitialList); 
     rsp.addToLog("hitsRef", patientsList.size()); 
     rb.setResult(result); 
    } 
}

來源

2012-07-28 00:59:15

謝謝，我已經在3種類型的文件進行搜索，所以它會如果我是最好的可以規範化並存儲它。你可以更多地瞭解一下查詢組件。 – user1185893 2012-07-29 03:13:07

好吧，看看它是如何適合在一起先看看這個：http://wiki.apache.org/solr/SolrRequestHandler – 2012-07-29 08:21:21

然後這個：http://wiki.apache.org/solr/SearchHandler – 2012-07-29 08:22:01

看看這篇文章：http://blog.griddynamics.com/2013/12/grandchildren-and-siblings-with-block.html

其實你可以做到這一點在SOLR 4.5

來源

2013-12-29 14:29:35 Tomer

多級加入索爾

回答

相關問題