2016-07-26 112 views
0

當我嘗試使用search:estimate獲得總搜索結果時,我得到錯誤的結果..當我嘗試從search:search解析總數時,它也給我錯誤或從一個頁面到另一頁我得到不同的總數。獲取marklogic中確切總搜索結果的最佳方法

我怎樣才能得到我的搜索字符串的確切數量?

--- XXXX編輯的問題------

我的數據庫由JSON文件,而這些文件JSON在結構層次。例如:以下是樣本,我保留在帖子的結尾..抱歉粘貼我的整個JSON結構,但我認爲你明白了。

我已創建字段/字段範圍指數的某些元素,例如

concept_species /species 
concept_name /name 
concept_registrar /registrar/name 
cept_scientist /scientist/name 
concept_supplier /suppliers/name 
concept_entitySubType /entitySubType 
concept_entityType /entityType 
concept_createdDate /createdDate 
concept_project /project/name 
concept_moniker /moniker 

當我搜索有其中之一的「約束」,那麼我xdmp:估計是很好..但是當我沒有任何的限制,這些對我的搜索字符串,然後xdmp :估計是關閉..我的搜索結果很好,但..所有的指標似乎很好?爲什麼是這種情況?因此我回到了總搜索結果的fn:count。

這可能與此問題無關,但爲了完整起見,我添加了這個..我創建了一個自定義約束,它基本上將約束轉換爲json中的路徑。例如:let我們說用戶想要搜索一個名稱爲「ATCC」的供應商..因此,我沒有在整個路徑中輸入用戶,而是創建了一個自定義約束,其中將像json結構一樣,並且我的constriant將其轉換爲實際的json路徑..所以在這種情況下,搜索字符串像這樣:((concept:suppliers.name:(ATCC)))),我的定製約束concept將它轉換爲以下CT:查詢

<cts:json-property-scope-query xmlns:cts="http://marklogic.com/cts"> 
    <cts:property>suppliers</cts:property> 
    <cts:json-property-scope-query> 
    <cts:property>name</cts:property> 
    <cts:word-query> 
     <cts:text xml:lang="en">ATCC</cts:text> 
     <cts:option>case-insensitive</cts:option> 
     <cts:option>punctuation-insensitive</cts:option> 
     <cts:option>whitespace-insensitive</cts:option> 
     <cts:option>wildcarded</cts:option> 
    </cts:word-query> 
    </cts:json-property-scope-query> 
</cts:json-property-scope-query> 

這是我的JSON文件結構

{ 
    "moniker": "", 
    "entityType": "", 
    "entitySubType": "", 
    "abbvNumber": "", 
    "bioSafetyLevel": "", 
    "name": "", 
    "extCorpID": "", 
    "extLotID": "", 
    "selectAgent": "", 
    "comments": "", 
     "nucleotideSeq": { 
     "seq": "" 
     }, 
     "chains": [ 
     { 

      "chainType": "", 
      "name": "", 
      "plasmidLotID": "", 
      "stochiometry": 0, 
      "aminoAcids": [ 
      { 
       "sequence": "", 
       "predictedMatureSeqs": [ 
       { 
        "encodedChainName": "", 
        "encodedChainType": "", 
        "sequence": "", 
        "domains": [ 
        { 
         "allotype": "", 
         "domainType": "", 
         "entrezgeneID": "", 
         "geneSymbol": "", 
         "heavyChainIsoType": "", 
         "lightChainIsoType": "", 
         "name": "", 
         "regonizedAntigenFK": "", 
         "species": "", 
         "heavyChainIsoTypeMutation": "", 
         "antigens": [ 
         { 

          "antiIdiotypeType": "", 
          "antibodyAntigen": "", 
          "corporateID": "", 
          "description": "", 
          "entrezgeneID": "", 
          "geneSymbol": "", 
          "name": "", 
          "relatedProtein": "", 
          "sequence": "", 
          "species": "", 
          "type": "", 
          "externalID": "" 
         } 
         ] 
        } 
        ] 
       } 
       ], 
       "domains": [ 
       { 
        "allotype": "", 
        "domainType": "", 
        "entrezgeneID": "", 
        "geneSymbol": "", 
        "heavyChainIsoType": "", 
        "lightChainIsoType": "", 
        "name": "", 
        "regonizedAntigenFK": "", 
        "species": "", 
        "heavyChainIsoTypeMutation": "", 
        "antigens": [ 
        { 

         "antiIdiotypeType": "", 
         "antibodyAntigen": "", 
         "corporateID": "", 
         "description": "", 
         "entrezgeneID": "", 
         "geneSymbol": "", 
         "name": "", 
         "relatedProtein": "", 
         "sequence": "", 
         "species": "", 
         "type": "", 
         "externalID": "" 
        } 
        ] 
       } 
       ] 
      } 
      ], 
      "constructs": [ 
      { 
       "plasmidID": "", 
       "precursorAminoAcidSeq": "" 
      } 
      ] 
     } 
     ], 
     "supplier": { 
     "name": "", 
     "productID": "", 
     "atccCatalogNumber": "", 
     "lotID": "" 
     }, 
     "preparation": { 
     "type": "", 
     "lotIDs": [ 
      "" 
     ], 
     "amminoAcidDerivatization": "", 
     "chemicalConjugations": [ 
      { 
      "name": "", 
      "dar": "" 
      } 
     ], 
     "peptidateTreatment": "", 
     "proteinTreatment": "", 
     "purification": "", 
     "expressionSystem": "", 
     "empty": false 
     } 
    }, 
    "project": { 

     "name": "", 
     "status": "" 
    }, 
    "registrar": { 
     "username": "", 
     "email": "", 
     "name": "", 
     "upi": "", 
     "admin": false, 
     "curator": false, 
     "approvedUser": false 
    }, 
    "scientist": { 
     "username": "", 
     "email": "", 
     "name": "", 
     "upi": "", 
     "admin": false, 
     "curator": false, 
     "approvedUser": false 
    }, 
    "notebook": { 

     "elnPage": "", 
     "upi": "", 
     "location": "", 
     "subpage": "" 
    }, 
    "growthFS": { 

     "mediumUsed": "", 
     "otherComponents": "", 
     "percentCO2": 0, 
     "percentHumudity": 0, 
     "percentSerum": 0, 
     "selectionMarker": "", 
     "spinnerPlateSpeed": 0, 
     "temp": 0, 
     "drugResistance": "", 
     "growthConditions": "", 
     "passageNumber": "" 
    }, 
    "origin": { 

     "dateOfTransfection": "", 
     "hcAntibodyIsotype": "", 
     "lcAntibodyIsotype": "", 
     "parentCellLineLotID": "", 
     "parentChildRel": "", 
     "parentTissueSpecies": "", 
     "strain": "", 
     "tissueSource": "", 
     "celllineMemID": "", 
     "dateFrozen": "", 
     "strFingerprint": "", 
     "plasmidLotIDs": [ 
     "" 
     ] 
    }, 
    "miscellaneous": { 

     "expHostType": "", 
     "selEukaryote": "", 
     "selProkaryote": "", 
     "buffer": "", 
     "enotoxinLevel": "", 
     "enotoxinUnit": "", 
     "enotoxinMethod": "", 
     "concentrationLevel": "", 
     "concentrationUnit": "", 
     "concentrationMethod": "", 
     "mixture": "", 
     "proteinMw": 0 
    }, 
    "nucleotideSeq": { 
     "seq": "" 
    }, 
    "preparation": { 

     "type": "", 
     "lotIDs": [ 
     "" 
     ], 
     "amminoAcidDerivatization": "", 
     "chemicalConjugations": [ 
     { 
      "name": "", 
      "dar": "" 
     } 
     ], 
     "peptidateTreatment": "", 
     "proteinTreatment": "", 
     "purification": "", 
     "expressionSystem": "", 
     "empty": false 
    }, 
    "adc": { 

     "dars": [ 
     { 
      "value": 0, 
      "method": "", 
      "precision": "", 
      "empty": false 
     } 
     ], 
     "aggregations": [ 
     { 
      "percentAggMethod": "", 
      "percentAggValue": 0 
     } 
     ] 
    }, 
    "createdBy": "", 
    "createdDate": "", 
    "modifiedBy": "", 
    "modifiedDate": "", 
    "alternateName": "", 
    "chains": [ 
     { 

     "chainType": "", 
     "name": "", 
     "plasmidLotID": "", 
     "stochiometry": 0, 
     "aminoAcids": [ 
      { 
      "sequence": "", 
      "predictedMatureSeqs": [ 
       { 

       "avgMolWt": 0, 
       "encodedChainName": "", 
       "encodedChainType": "", 
       "length": 0, 
       "sequence": "", 
       "domains": [ 
        { 

        "allotype": "", 
        "domainType": "", 
        "domainEnd": 0, 
        "entrezgeneID": "", 
        "geneSymbol": "", 
        "heavyChainIsoType": "", 
        "lightChainIsoType": "", 
        "name": "", 
        "regonizedAntigenFK": "", 
        "species": "", 
        "domainStart": 0, 
        "heavyChainIsoTypeMutation": "", 
        "antigens": [ 
         { 

         "antiIdiotypeType": "", 
         "antibodyAntigen": "", 
         "corporateID": "", 
         "description": "", 
         "entrezgeneID": "", 
         "geneSymbol": "", 
         "name": "", 
         "relatedProtein": "", 
         "sequence": "", 
         "species": "", 
         "type": "", 
         "externalID": "" 
         } 
        ] 
        } 
       ] 
       } 
      ], 
      "domains": [ 
       { 

       "allotype": "", 
       "domainType": "", 
       "domainEnd": 0, 
       "entrezgeneID": "", 
       "geneSymbol": "", 
       "heavyChainIsoType": "", 
       "lightChainIsoType": "", 
       "name": "", 
       "regonizedAntigenFK": "", 
       "species": "", 
       "domainStart": 0, 
       "heavyChainIsoTypeMutation": "", 
       "antigens": [ 
        { 

        "antiIdiotypeType": "", 
        "antibodyAntigen": "", 
        "corporateID": "", 
        "description": "", 
        "entrezgeneID": "", 
        "geneSymbol": "", 
        "name": "", 
        "relatedProtein": "", 
        "sequence": "", 
        "species": "", 
        "type": "", 
        "externalID": "" 
        } 
       ] 
       } 
      ] 
      } 
     ], 
     "constructs": [ 
      { 
      "plasmidID": "", 
      "precursorAminoAcidSeq": "" 
      } 
     ] 
     } 
    ], 
    "orfs": [ 
     { 

     "orfEnd": 0, 
     "intronsPresent": "", 
     "orfStart": 0, 
     "promoters": [ 
      "" 
     ], 
     "aminoAcids": [ 
      { 
      "sequence": "", 
      "predictedMatureSeqs": [ 
       { 
       "encodedChainName": "", 
       "encodedChainType": "", 
       "length": 0, 
       "sequence": "", 
       "domains": [ 
        { 

        "allotype": "", 
        "domainType": "", 
        "domainEnd": 0, 
        "entrezgeneID": "", 
        "geneSymbol": "", 
        "heavyChainIsoType": "", 
        "lightChainIsoType": "", 
        "name": "", 
        "regonizedAntigenFK": "", 
        "species": "", 
        "domainStart": 0, 
        "heavyChainIsoTypeMutation": "", 
        "antigens": [ 
         { 

         "antiIdiotypeType": "", 
         "antibodyAntigen": "", 
         "corporateID": "", 
         "description": "", 
         "entrezgeneID": "", 
         "geneSymbol": "", 
         "name": "", 
         "relatedProtein": "", 
         "sequence": "", 
         "species": "", 
         "type": "", 
         "externalID": "" 
         } 
        ] 
        } 
       ] 
       } 
      ], 
      "domains": [ 
       { 
       "allotype": "", 
       "domainType": "", 
       "domainEnd": 0, 
       "entrezgeneID": "", 
       "geneSymbol": "", 
       "heavyChainIsoType": "", 
       "lightChainIsoType": "", 
       "name": "", 
       "regonizedAntigenFK": "", 
       "species": "", 
       "domainStart": 0, 
       "heavyChainIsoTypeMutation": "", 
       "antigens": [ 
        { 

        "antiIdiotypeType": "", 
        "antibodyAntigen": "", 
        "corporateID": "", 
        "description": "", 
        "entrezgeneID": "", 
        "geneSymbol": "", 
        "name": "", 
        "relatedProtein": "", 
        "sequence": "", 
        "species": "", 
        "type": "", 
        "externalID": "" 
        } 
       ] 
       } 
      ] 
      } 
     ], 
     "ncSeq": { 

      "seq": "" 
     }, 
     "label": "", 
     "note": "" 
     } 
    ], 
    "antigens": [ 
     { 

     "antiIdiotypeType": "", 
     "antibodyAntigen": "", 
     "corporateID": "", 
     "description": "", 
     "entrezgeneID": "", 
     "geneSymbol": "", 
     "name": "", 
     "relatedProtein": "", 
     "sequence": "", 
     "species": "", 
     "type": "", 
     "externalID": "" 
     } 
    ], 
    "immunogens": [ 
     { 

     "type": "", 
     "name": "", 
     "entrezgeneID": "", 
     "geneSymbol": "", 
     "corporateID": "", 
     "species": "", 
     "lotID": "", 
     "sequence": "" 
     } 
    ], 
    "suppliers": [ 
     { 

     "name": "", 
     "productID": "", 
     "atccCatalogNumber": "", 
     "lotID": "" 
     } 
    ], 
    "domains": [ 
     { 

     "allotype": "", 
     "domainType": "", 
     "domainEnd": 0, 
     "entrezgeneID": "", 
     "geneSymbol": "", 
     "heavyChainIsoType": "", 
     "lightChainIsoType": "", 
     "name": "", 
     "regonizedAntigenFK": "", 
     "species": "", 
     "domainStart": 0, 
     "heavyChainIsoTypeMutation": "", 
     "antigens": [ 
      { 

      "antiIdiotypeType": "", 
      "antibodyAntigen": "", 
      "corporateID": "", 
      "description": "", 
      "entrezgeneID": "", 
      "geneSymbol": "", 
      "name": "", 
      "relatedProtein": "", 
      "sequence": "", 
      "species": "", 
      "type": "", 
      "externalID": "" 
      } 
     ] 
     } 
} 
+1

您可能需要重構您的文檔,以便它們與您的搜索查詢和表達式是一對一的。但是,沒有示例XML和查詢,就不可能提出建議。 – wst

+0

我更新了我的問題,詳細瞭解了我的文檔結構以及我如何進行搜索 – Ravi

回答

0

這一切都是關於片段以及您看到的數字是基於片段估計的事實。如果你沒有看到你的期望,那麼有幾個選項(改變文檔,片段根/父母,過濾搜索等)。但是,如wst提到 - 舉個例子,然後人們將能夠給你更多直接指導..

+0

我更新了我的問題,提供了有關我的文檔結構以及如何執行搜索的更多詳細信息 – Ravi

0

我參加了一個性能損失,但是能使用fn:count

我使用search:search自定義的約束來解決,所以在我的情況下,所有我需要做的就是以下

fn:count(cts:search(fn:doc(), cts:query(search:parse($q, $options)))) 
+0

自定義約束不是估計與fn:count之間的差異的原因。另外,您提出的解決方案不能很好地擴展。 –

+1

還要記住過濾和未過濾的搜索。如果您的查詢和所有索引都正確,那麼您可以運行未經過濾的搜索,您的查詢將運行得更快,我相信您的搜索:估算和搜索:搜索總數將是準確的。 –

+1

[使用fn:count與xdmp:估計值](http://docs.marklogic.com/guide/search-dev/count_estimate) - 對於考慮這種方法的任何人來說必不可少的讀物。 –

1

Sam Mefford在他的評論中提供了更好的答案 - 「還要記住過濾與不過濾紅色搜索。如果你得到你的查詢和所有索引的權利,所以你可以運行未經過濾的搜索你的查詢將運行得更快,我相信你的搜索:估計和搜索:搜索總數將是準確的。「

fn:count()永遠不是最優的。僅用於計算小序列,文檔集,結果集等。過濾搜索也顯着低於未過濾搜索。如果您調整索引,則可以得到未過濾的搜索,並返回來自搜索的精確計數:estimate,xdmp :估計和搜索:搜索分頁。

0

如果您沒有任何fragmentation strategy定義,xdmp:estimate應該給出正確的結果。這將比fn:count快得多。 你可以重寫一樣的代碼 -

xdmp:估計(CTS:搜索(FN:(DOC),CTS:查詢(搜索:​​解析($ Q,$選項))))

相關問題