我想建議另一種通用的解決方案,也避免了笛卡爾乘積作爲@InverseFalcon指出。
讓我們確實通過更快的查找創建索引,並插入一些測試數據開始:
CREATE CONSTRAINT ON (n:Node) ASSERT n.treeNumber IS UNIQUE;
CREATE (n:Node {treeNumber: 'A01.111.230'})
CREATE (n:Node {treeNumber: 'A01.111'})
CREATE (n:Node {treeNumber: 'A01'})
這時我們就需要掃描所有節點作爲潛在的父母,並尋找與的的treeNumber
開始孩子父(STARTS WITH
可以使用索引),沒有點中的treeNumber
(即直接子)的「剩女」,而不是分裂,加盟等:
MATCH (p:Node), (c:Node)
WHERE c.treeNumber STARTS WITH p.treeNumber
AND p <> c
AND NOT substring(c.treeNumber, length(p.treeNumber) + 1) CONTAINS '.'
RETURN p, c
我用一個簡單的RETURN
替換了創建關係的目的,但您可以簡單地將其替換爲CREATE UNIQUE
或MERGE
。
其實,我們可以擺脫p <> c
謂詞和的+ 1
通過預先計算的實際前綴的長度應符合的:
MATCH (p:Node)
WITH p, p.treeNumber + '.' AS parentNumber
MATCH (c:Node)
WHERE c.treeNumber STARTS WITH parentNumber
AND NOT substring(c.treeNumber, length(parentNumber)) CONTAINS '.'
RETURN p, c
然而,剖析該查詢表明,該指數是不使用,且有是笛卡爾乘積(所以我們有一個爲O(n^2)算法):
Compiler CYPHER 3.0
Planner COST
Runtime INTERPRETED
+--------------------+----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+--------------------+----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+
| +ProduceResults | 2 | 2 | 0 | c, p | p, c |
| | +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+
| +Filter | 2 | 2 | 26 | c, p, parentNumber | NOT(Contains(SubstringFunction(c.treeNumber,length(parentNumber),None),{ AUTOSTRING1})) AND StartsWith(c.treeNumber,parentNumber) |
| | +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+
| +Apply | 2 | 9 | 0 | p, parentNumber -- c | |
| |\ +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+
| | +NodeByLabelScan | 9 | 9 | 12 | c | :Node |
| | +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+
| +Projection | 3 | 3 | 3 | parentNumber -- p | p; Add(p.treeNumber,{ AUTOSTRING0}) |
| | +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+
| +NodeByLabelScan | 3 | 3 | 4 | p | :Node |
+--------------------+----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+
Total database accesses: 45
但是,如果我們簡單增加一個提示,像這樣
MATCH (p:Node)
WITH p, p.treeNumber + '.' AS parentNumber
MATCH (c:Node)
USING INDEX c:Node(treeNumber)
WHERE c.treeNumber STARTS WITH parentNumber
AND NOT substring(c.treeNumber, length(parentNumber)) CONTAINS '.'
RETURN p, c
它不使用索引,我們必須像一個O(N *的log(n))算法(的log(n)的索引查找):
Compiler CYPHER 3.0
Planner COST
Runtime INTERPRETED
+-------------------------------+----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-------------------------------+----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+
| +ProduceResults | 2 | 2 | 0 | c, p | p, c |
| | +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+
| +Filter | 2 | 2 | 6 | c, p, parentNumber | NOT(Contains(SubstringFunction(c.treeNumber,length(parentNumber),None),{ AUTOSTRING1})) |
| | +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+
| +Apply | 2 | 3 | 0 | p, parentNumber -- c | |
| |\ +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+
| | +NodeUniqueIndexSeekByRange | 9 | 3 | 6 | c | :Node(treeNumber STARTS WITH parentNumber) |
| | +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+
| +Projection | 3 | 3 | 3 | parentNumber -- p | p; Add(p.treeNumber,{ AUTOSTRING0}) |
| | +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+
| +NodeByLabelScan | 3 | 3 | 4 | p | :Node |
+-------------------------------+----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+
Total database accesses: 19
注意引進WITH
步創建前綴年初的時候,我沒騙了一下,因爲我注意到它提高了執行計劃和DB訪問過
MATCH (p:Node), (c:Node)
USING INDEX c:Node(treeNumber)
WHERE c.treeNumber STARTS WITH p.treeNumber
AND p <> c
AND NOT substring(c.treeNumber, length(p.treeNumber) + 1) CONTAINS '.'
RETURN p, c
它具有以下執行計劃:
Compiler CYPHER 3.0
Planner RULE
Runtime INTERPRETED
+--------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------+
| Operator | Rows | DB Hits | Variables | Other |
+--------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------+
| +Filter | 2 | 9 | c, p | NOT(p == c) AND NOT(Contains(SubstringFunction(c.treeNumber,Add(length(p.treeNumber),{ AUTOINT0}),None),{ AUTOSTRING1})) |
| | +------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------+
| +SchemaIndex | 6 | 12 | c -- p | PrefixSeekRangeExpression(p.treeNumber); :Node(treeNumber) |
| | +------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------+
| +NodeByLabel | 3 | 4 | p | :Node |
+--------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------+
Total database accesses: 25
最後,備案,我寫的原始查詢的執行計劃(即沒有提示)是:
Compiler CYPHER 3.0
Planner COST
Runtime INTERPRETED
+--------------------+----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+--------------------+----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +ProduceResults | 2 | 2 | 0 | c, p | p, c |
| | +----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +Filter | 2 | 2 | 21 | c, p | NOT(p == c) AND StartsWith(c.treeNumber,p.treeNumber) AND NOT(Contains(SubstringFunction(c.treeNumber,Add(length(p.treeNumber),{ AUTOINT0}),None),{ AUTOSTRING1})) |
| | +----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +CartesianProduct | 9 | 9 | 0 | p -- c | |
| |\ +----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| | +NodeByLabelScan | 3 | 9 | 12 | c | :Node |
| | +----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +NodeByLabelScan | 3 | 3 | 4 | p | :Node |
+--------------------+----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Total database accesses: 37
這不是最糟糕的一個:沒有提示,但與預先計算的前綴是!這就是爲什麼你應該總是測量。
這可能會執行笛卡爾產品,使得這是一個n^2操作。另外,你的描述有一個例子'A01.111.23',它打破了你假設每個treeNumbers的數字子集都有3個數字的假設。不過,這不是一個糟糕的問題。 – InverseFalcon