2017-01-29 69 views
1

我如何暗示限制可變長度路徑?Neo4J Cypher:按條件篩選出可變長度路徑

我有一些start節點查詢所有可能的路徑:

CREATE INDEX ON :NODE(id) 
MATCH all_paths_from_Start = (start:Person)-[:FRIENDSHIP*1..20]->(person:Person) 
WHERE start.id = 128 AND start.country <> "Uganda" 
RETURN paths; 

沒有我想篩選出其中至少有兩個人具有相同country所有路徑。 我怎麼能這樣做?我能想到的

回答

3

1)獲取的國家的陣列與可能重複的路徑:REDUCE

2)刪除重複與比較陣列的尺寸:UNWIND + COLLECT(DISTINCT...)

MATCH path = (start:Person)-[:FRIENDSHIP*1..20]->(person:Person) 
     WHERE start.id = 128 AND start.country <> "Uganda" 
WITH path, 
    REDUCE(acc=[], n IN NODES(path) | acc + n.country) AS countries 
    UNWIND countries AS country 
WITH path, 
    countries, COLLECT(DISTINCT country) AS distinctCountries 
    WHERE SIZE(countries) = SIZE(distinctCountries) 
RETURN path 

P.S.

MATCH path = (start:Person)-[:FRIENDSHIP*1..20]->(person:Person) 
     WHERE start.id = 128 AND start.country <> "Uganda" 
WITH path, 
    EXTRACT(n IN NODES(path) | n.country) AS countries 
    UNWIND countries AS country 
WITH path, 
    countries, COLLECT(DISTINCT country) AS distinctCountries 
    WHERE SIZE(countries) = SIZE(distinctCountries) 
RETURN path 

P.P.S.:REDUCE可以通過EXTRACT(感謝的Gabor Szarnyas)來代替再次感謝Gabor Szarnyas爲簡化查詢的另一個想法:

MATCH path = (start:Person)-[:FRIENDSHIP*1..20]->(person:Person) 
     WHERE start.id = 128 AND start.country <> "Uganda" 
WITH path 
    UNWIND NODES(path) AS person 
WITH path, 
    COLLECT(DISTINCT person.country) as distinctCountries 
    WHERE LENGTH(path) + 1 = SIZE(distinctCountries) 
RETURN path 
+0

謝謝你的回答!從性能的角度來看它會是最好的選擇嗎?我將有一個非常大的數據集,有多達數百萬個節點。有什麼方法可以使查詢更快嗎? –

+1

@VolodymyrBakhmatiuk查詢中最重的部分是第一個匹配:MATCH path =(start:Person) - [:FRIENDSHIP * 1..20] - >(person:Person)WHERE start.id = 128 AND start .country <>「烏干達」'。目前還不完全明白如何改進它... –

+1

@VolodymyrBakhmatiuk測試路徑最簡單的部分... –

2

一種解決方法是獲取路徑的nodes,併爲每個人的道路上,extract通過filter將來自同一個國家的人(這是我們確定的數量值荷蘭國際集團的同一個國家。路徑具有獨特的國家的人,如果有來自同一個國家,即適用於所有的人零名人員,只有從該國一個人(此人他/她自己)。

MATCH p = (start:Person {id: 128})-[:FRIENDSHIP*1..20]->(person:Person) 
WHERE start.country <> "Uganda" 
WITH p, nodes(p) AS persons 
WITH p, extract(p1 IN persons | size(filter(p2 IN persons WHERE p1.country = p2.country))) AS personsFromSameCountry 
WHERE length(filter(p3 IN personsFromSameCountry WHERE p3 > 1)) = 0 
RETURN p 

的查詢在語法上是正確的,但我沒有在任何數據上測試它。

請注意,我將id = 128條件移至該模式,並將all_paths_from_Start變量縮短爲p