2017-02-27 91 views
2

鑑於以下做作數據庫:在Neo4j中,能找到其關係是另一個節點關係超集的所有節點嗎?

CREATE (a:Content {id:'A'}), 
    (b:Content {id:'B'}), 
    (c:Content {id:'C'}), 
    (d:Content {id:'D'}), 
    (ab:Container {id:'AB'}), 
    (ab2:Container {id:'AB2'}), 
    (abc:Container {id:'ABC'}), 
    (abcd:Container {id:'ABCD'}), 
    ((ab)-[:CONTAINS]->(a)), 
    ((ab)-[:CONTAINS]->(b)), 
    ((ab2)-[:CONTAINS]->(a)), 
    ((ab2)-[:CONTAINS]->(b)), 
    ((abc)-[:CONTAINS]->(a)), 
    ((abc)-[:CONTAINS]->(b)), 
    ((abc)-[:CONTAINS]->(c)), 
    ((abcd)-[:CONTAINS]->(a)), 
    ((abcd)-[:CONTAINS]->(b)), 
    ((abcd)-[:CONTAINS]->(c)), 
    ((abcd)-[:CONTAINS]->(d)) 

有沒有能檢測所有Container節點對的查詢,其中一個CONTAINS任何的超集或相同Content節點作爲其他Container節點?

就我的示例數據庫,我想查詢返回:

(ABCD) is a superset of (ABC), (AB), and (AB2) 
(ABC) is a superset of (AB), and (AB2) 
(AB) and (AB2) contain the same nodes 

如果暗號是不適合這個,但另一種查詢語言非常適合於它,或者如果Neo4j的是不適合這個,但另一個數據庫非常適合它,我也很欣賞這方面的投入。

Visualization of the sample database

回答查詢性能(爲2017-02-28T21:56Z)

我沒有足夠的經驗然而,隨着Neo4j的或圖形數據庫的查詢,分析答案的性能,和我還沒有構建我的大數據集來進行更有意義的比較,但我想我會使用PROFILE命令運行每個數據集並列出數據庫命中成本。我省略了時間數據,因爲我無法使這樣一個小數據集保持一致或有意義。

  • stdob--:129總分貝擊中
  • 戴夫貝內特:46總分貝擊中
  • InverseFalcon:27總分貝擊中
+0

兩個戴夫貝內特和stdob - 的答案似乎給了我,我要求的結果,謝謝。我已經提出了兩項​​提案,並且一旦我在更大的數據集上嘗試過它們,就會給予答案,因爲我不得不選擇一個答案。 – Gregyski

+0

關於大型數據集中有多少個Container節點? – InverseFalcon

+0

我還沒有組裝它(這需要做一些工作,現在我的議程上已經開始,我知道我有可行的工具來完成後面的計算)。然而,70,000容器似乎是一個現實的估計。每個容器的內容範圍從幾個到幾百個不等,但平均大概是30個。 – Gregyski

回答

2

這裏是第一次嘗試。我相信這可以使用一些細化,但這應該讓你去。

// find the containers and their contents 
match (n:Container)-[:CONTAINS]->(c:Content) 

// group the contents per container 
with n as container, collect(c.id) as contents 

// combine the continers and their contents 
with collect(container{.id, contents: contents}) as containers 

// loop through the list of containers 
with containers, size(containers) as container_size 
unwind range(0, container_size -1) as i 
unwind range(0, container_size -1) as j 

// for each container pair compare the contents 
with containers, i, j 
where i <> j 
and all(content IN containers[j].contents WHERE content in containers[i].contents) 
with containers[i].id as superset, containers[j].id as subset 
return superset, collect(subset) as subsets 
3
// Get contents for each container 
MATCH (SS:Container)-[:CONTAINS]->(CT:Content) 
     WITH SS, 
      collect(distinct CT) as CTS 
// Get all container not equal SS 
MATCH (T:Container) 
     WHERE T <> SS 
// For each container get their content 
MATCH (T)-[:CONTAINS]->(CT:Content) 
     // Test if nestd 
     WITH SS, 
     CTS, 
     T, 
     ALL(ct in collect(distinct CT) WHERE ct in CTS) as test 
     WHERE test = true 
RETURN SS, collect(T) 
2

我會用,讓容器及其收集的內容後,該方法是通過對其內容的計數來過濾下來,其容器被互相比較,然後運行apoc.coll.containsAll() from APOC Procedures來篩選的超集/同等組。最後,你可以比較內容的數量來判斷它是超集還是同等集,然後收集。

事情是這樣的:

match (con:Container)-[:CONTAINS]->(content) 
with con, collect(content) as contents 
with collect({con:con, contents:contents, size:size(contents)}) as all 
unwind all as first 
unwind all as second 
with first, second 
where first <> second and first.size >= second.size 
with first, second 
where apoc.coll.containsAll(first.contents, second.contents) 
with first, 
case when first.size = second.size and id(first.con) < id(second.con) then second end as same, 
case when first.size > second.size then second end as superset 
with first.con as container, collect(same.con) as sameAs, collect(superset.con) as supersetOf 
where size(sameAs) > 0 or size(supersetOf) > 0 
return container, sameAs, supersetOf 
order by size(supersetOf) desc, size(sameAs) desc 
相關問題