2015-12-03 26 views
1

我有一個類似於此的數據集。基本上它由單詞文檔的不同頁面組成,表示頁碼和頁面的全文。(Cloudant DB Java API)執行類似操作選擇與WHERE子句不同

{ 
    "_id": "4b36u6vwkZH16H5vmc24sBfuZk0CRqfP", 
    "_rev": "1-r5WQDAJPPuUP0oLapZrMiMRd6rOaTIz9", 
    "FILE_NAME": "sample.doc", 
    "PAGE_NUM": 1, 
    "PAGE_FULLTEXT": "hello world", 
}, 
{ 
    "_id": "nDIKw5JUWFWVD8m7HEODMa1vNI5gFEXS", 
    "_rev": "1-nEp7zsuaneJj2AInyPpeBWDNP90ZGpWQ", 
    "FILE_NAME": "sample.doc", 
    "PAGE_NUM": 2, 
    "PAGE_FULLTEXT": "this is john doe", 
}, 
{ 
    "_id": "vCTlNbNk3X893FkWSYnn87L9j371taYZ", 
    "_rev": "1-oJPspiBHRPeT99m8VPV9qoDTTBoJ9tVK", 
    "FILE_NAME": "sample-2.doc", 
    "PAGE_NUM": 1, 
    "PAGE_FULLTEXT": "this is another document", 
}, 
{ 
    "_id": "2FSDuaEa5bYtP2l7lEgMnqMnqsZpMJUs", 
    "_rev": "1-ZQRkvfMluu0NQWYH2FUATuXy9uNtOGyk", 
    "FILE_NAME": "sample-2.doc", 
    "PAGE_NUM": 2, 
    "PAGE_FULLTEXT": "page 2 of sample-2.doc", 
}, 
{ 
    "_id": "RET7G6hUU9zSplgW7FIXWKwIVex2NEmI", 
    "_rev": "1-mlryGv830RNllPwFT7JDDvJoKXuvxAXD", 
    "FILE_NAME": "sample-3.doc", 
    "PAGE_NUM": 1, 
    "PAGE_FULLTEXT": "hello lionel", 
}, 
{ 
    "_id": "VBL6BJBevcvUc6EsJ68bAjHuGRJ6zvMt", 
    "_rev": "1-fPIJQHKCB2WitR74l1X8I6TOBMhMeCWF", 
    "FILE_NAME": "sample-3.doc", 
    "PAGE_NUM": 2, 
    "PAGE_FULLTEXT": "page hello 2 of sample-3.doc", 
} 

到目前爲止,我能夠通過檢查帖子How do I do the SQL equivalent of "DISTINCT" in CouchDB?

的一個做選擇重複計數類似的查詢現在的問題是,我怎麼會是能夠通過數據集,然後組搜索他們通過FILE_NAME(輸出時使用SQL代碼是SELECT DISTINCT FILE_NAME WHERE PAGE_FULLTEXT像「%你好%」相似)

回答

1

的鮮明CouchDB中通常的等效是使用在查詢時MapReduce的視圖和group_level=1group=true

但是你的問題的大部分是WHERE PAGE_FULLTEXT like "%hello%"位。如您所示,MapReduce視圖不適合模糊匹配。

幸運的是,Cloudant有Cloudant Search,它允許創建全文索引。 Cloudant搜索索引是在函數(如MapReduce)中使用index函數定義要編制索引的字段。在它的最簡單,使用您的樣本數據,索引功能是:

function(doc) { 
    index("default", doc.PAGE_FULLTEXT); 
} 

哪些索引你的文檔消化到默認領域。

一旦建立索引,可以使用/_design/yourdesigndoc/_search/yourindexname?q=hello+world查詢視圖以生成與字符串「hello world」最匹配的文檔。

+0

這太好了。絕對是我在尋找的。謝謝。 – Jigs