2016-09-13 184 views
1

我已經從CSV文件導入數據,並創造了很多節點,所有這些都是與其他節點基於同一數據集內一個「樹編號的」分級制度:創建分層數據(樹)結構的Neo4j使用「樹鍵」

例如,樹編號A01.111節點是節點A01的直接孩子,並與樹編號A01.111.230節點是一個可怕的ct的子節點A01.111

我正在嘗試做什麼是在其他節點的直接子節點之間創建獨特的關係。例如節點A01.111.230應該只有有一個「IS_CHILD_OF」關係,節點A01.111

我曾嘗試幾件事情,例如:

MATCH (n:Node), (n2:Node) 
WHERE (n2.treeNumber STARTS WITH n.treeNumber) 
AND (n <> n2) 
AND NOT ((n2)-[:IS_CHILD_OF]->()) 
CREATE UNIQUE (n2)-[:IS_CHILD_OF]->(n); 

這個例子的結果創造獨特的「IS_CHILD_OF」的關係,但不是節點的直接父。相反,節點A01.111.230將與節點A01有關。

回答

0

我想我只是想出了一個解決方案! (如果有人有一個更優雅的一個請不要交)

我剛剛意識到「樹編號」編碼系統總是使用點之間的3位數字,即A01.111.230C02.100,因此如果一個節點是另一個節點的直接子節點,它應該是「樹號」不僅以父節點的樹號開頭,它也應該是4個字符長度(一個字符用於點'。'和3字符爲數字值)。

因此我的解決方案,似乎做的工作是:

MATCH (n:Node), (n2:Node) 
WHERE (n2.treeNumber STARTS WITH n.treeNumber) 
AND (length(n2.treeNumber) = (length(n.treeNumber) + 4)) 
CREATE UNIQUE (n2)-[:IS_CHILD_OF]->(n); 
+0

這可能會執行笛卡爾產品,使得這是一個n^2操作。另外,你的描述有一個例子'A01.111.23',它打破了你假設每個treeNumbers的數字子集都有3個數字的假設。不過,這不是一個糟糕的問題。 – InverseFalcon

0

對於您的要求STARTS WITH將無法​​正常工作,因爲A01.111.23確實與A01除了開機啓動與A01.111

treeNumber是由幾個部分「」作爲分隔符。我們不要對個別部分的最大/最小可能字符長度做任何假設。我們需要的是比較每個節點的treeNumber的最後部分和正在測試的潛在子節點的最後部分。

MATCH (n1:Node), (n2:Node) 
WHERE split(n2.treeNumber,'.')[0..-1] = split(n1.treeNumber,'.') 
CREATE UNIQUE (n2)-[:IS_CHILD_OF]->(n1); 

split()功能分割字符串,在給定的分離器的每一次出現,爲字符串(部分)的列表:可以按如下方式實現這一目標使用Cypher支架的split()功能。在這種情況下,分隔符是'。'拆分任何treeNumber。我們可以使用語法list[{startIndex}..{endIndex}]在密碼中選擇一個列表的子集。用於反向查找的負指標是允許的,例如上述查詢中使用的指標。

該解決方案應概括所有可能的treeNumber值,採用當前的格式,而不考慮零件數量和單個零件長度。

+0

儘管我同意通用解決方案更可取,但我認爲您的查詢可能不會利用treeNumber上的索引或約束,也可能正在執行笛卡爾產品。小數據集不是問題,但可能是一個較大集合的問題,或者它將運行多次。 – InverseFalcon

1

我想我們可以在查詢上改進一點。首先,確保您在Node.treeNumber上具有唯一的約束或索引,因爲您需要在此查詢中改進您的父節點查找。

接下來,讓我們匹配子節點,不包括根節點(假設根的treeNumber中沒有.s)和已經處理並且已經有關係的節點。

然後我們將通過treeNumber使用我們的索引找到每個節點的父節點,並創建該關係。這假定一個子treeNumber總是有4個以上的字符,包括點。

MATCH (child:Node) 
WHERE child.treeNumber CONTAINS '.' 
AND NOT EXISTS((child)-[:IS_CHILD_OF]->()) 
WITH child, SUBSTRING(child.treeNumber, 0, SIZE(child.treeNumber)-4) as parentNumber 
MATCH (parent:Node) 
WHERE parent.treeNumber = parentNumber 
CREATE UNIQUE (child)-[:IS_CHILD_OF]->(parent) 

我覺得這個查詢避免了笛卡爾乘積,你可以從其他的答案弄的,應在O(n)的(有人糾正我,如果我錯了)。

編輯

在在treeNumbers號碼中的各個子集不拘泥於3(在你的描述,實際上,與「A01.111.23」)的情況下,那麼你需要導出parentNumber的不同手段。 Neo4j在這裏有點弱,因爲它缺少indexOf()函數和join()函數來反轉split()。您可能需要安裝APOC Procedures library以允許訪問join()函數。

處理與在treeNumber的數字子集數字變量計數的情況下查詢成爲本:

MATCH (child:Node) 
WHERE child.treeNumber CONTAINS '.' 
AND NOT EXISTS((child)-[:IS_CHILD_OF]->()) 
WITH child, SPLIT(child.treeNumber, '.') as splitNumber 
CALL apoc.text.join(splitNumber[0..-1], '.') YIELD value AS parentNumber 
WITH child, parentNumber 
MATCH (parent:Node) 
WHERE parent.treeNumber = parentNumber 
CREATE UNIQUE (child)-[:IS_CHILD_OF]->(parent) 
3

我想建議另一種通用的解決方案,也避免了笛卡爾乘積作爲@InverseFalcon指出。

讓我們確實通過更快的查找創建索引,並插入一些測試數據開始:

CREATE CONSTRAINT ON (n:Node) ASSERT n.treeNumber IS UNIQUE; 
CREATE (n:Node {treeNumber: 'A01.111.230'}) 
CREATE (n:Node {treeNumber: 'A01.111'}) 
CREATE (n:Node {treeNumber: 'A01'}) 

這時我們就需要掃描所有節點作爲潛在的父母,並尋找與的的treeNumber開始孩子父(STARTS WITH可以使用索引),沒有點中的treeNumber(即直接子)的「剩女」,而不是分裂,加盟等:

MATCH (p:Node), (c:Node) 
WHERE c.treeNumber STARTS WITH p.treeNumber 
    AND p <> c 
    AND NOT substring(c.treeNumber, length(p.treeNumber) + 1) CONTAINS '.' 
RETURN p, c 

我用一個簡單的RETURN替換了創建關係的目的,但您可以簡單地將其替換爲CREATE UNIQUEMERGE

其實,我們可以擺脫p <> c謂詞+ 1通過預先計算的實際前綴的長度應符合的:

MATCH (p:Node) 
WITH p, p.treeNumber + '.' AS parentNumber 
MATCH (c:Node) 
WHERE c.treeNumber STARTS WITH parentNumber 
    AND NOT substring(c.treeNumber, length(parentNumber)) CONTAINS '.' 
RETURN p, c 

然而,剖析該查詢表明,該指數是不使用,且有笛卡爾乘積(所以我們有一個爲O(n^2)算法):

Compiler CYPHER 3.0 

Planner COST 

Runtime INTERPRETED 

+--------------------+----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+ 
| Operator   | Estimated Rows | Rows | DB Hits | Variables   | Other                                | 
+--------------------+----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+ 
| +ProduceResults |    2 | 2 |  0 | c, p     | p, c                                | 
| |     +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+ 
| +Filter   |    2 | 2 |  26 | c, p, parentNumber | NOT(Contains(SubstringFunction(c.treeNumber,length(parentNumber),None),{ AUTOSTRING1})) AND StartsWith(c.treeNumber,parentNumber) | 
| |     +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+ 
| +Apply    |    2 | 9 |  0 | p, parentNumber -- c |                                 | 
| |\     +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+ 
| | +NodeByLabelScan |    9 | 9 |  12 | c     | :Node                                | 
| |     +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+ 
| +Projection  |    3 | 3 |  3 | parentNumber -- p | p; Add(p.treeNumber,{ AUTOSTRING0})                        | 
| |     +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+ 
| +NodeByLabelScan |    3 | 3 |  4 | p     | :Node                                | 
+--------------------+----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+ 

Total database accesses: 45 

但是,如果我們簡單增加一個提示,像這樣

MATCH (p:Node) 
WITH p, p.treeNumber + '.' AS parentNumber 
MATCH (c:Node) 
USING INDEX c:Node(treeNumber) 
WHERE c.treeNumber STARTS WITH parentNumber 
    AND NOT substring(c.treeNumber, length(parentNumber)) CONTAINS '.' 
RETURN p, c 

它不使用索引,我們必須像一個O(N *的log(n))算法(的log(n)的索引查找):

Compiler CYPHER 3.0 

Planner COST 

Runtime INTERPRETED 

+-------------------------------+----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+ 
| Operator      | Estimated Rows | Rows | DB Hits | Variables   | Other                     | 
+-------------------------------+----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+ 
| +ProduceResults    |    2 | 2 |  0 | c, p     | p, c                      | 
| |        +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+ 
| +Filter      |    2 | 2 |  6 | c, p, parentNumber | NOT(Contains(SubstringFunction(c.treeNumber,length(parentNumber),None),{ AUTOSTRING1})) | 
| |        +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+ 
| +Apply      |    2 | 3 |  0 | p, parentNumber -- c |                       | 
| |\       +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+ 
| | +NodeUniqueIndexSeekByRange |    9 | 3 |  6 | c     | :Node(treeNumber STARTS WITH parentNumber)            | 
| |        +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+ 
| +Projection     |    3 | 3 |  3 | parentNumber -- p | p; Add(p.treeNumber,{ AUTOSTRING0})              | 
| |        +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+ 
| +NodeByLabelScan    |    3 | 3 |  4 | p     | :Node                     | 
+-------------------------------+----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+ 

Total database accesses: 19 

注意引進WITH步創建前綴年初的時候,我沒騙了一下,因爲我注意到它提高了執行計劃和DB訪問過

MATCH (p:Node), (c:Node) 
USING INDEX c:Node(treeNumber) 
WHERE c.treeNumber STARTS WITH p.treeNumber 
    AND p <> c 
    AND NOT substring(c.treeNumber, length(p.treeNumber) + 1) CONTAINS '.' 
RETURN p, c 

它具有以下執行計劃:

Compiler CYPHER 3.0 

Planner RULE 

Runtime INTERPRETED 

+--------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------+ 
| Operator  | Rows | DB Hits | Variables | Other                              | 
+--------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------+ 
| +Filter  | 2 |  9 | c, p  | NOT(p == c) AND NOT(Contains(SubstringFunction(c.treeNumber,Add(length(p.treeNumber),{ AUTOINT0}),None),{ AUTOSTRING1})) | 
| |   +------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------+ 
| +SchemaIndex | 6 |  12 | c -- p | PrefixSeekRangeExpression(p.treeNumber); :Node(treeNumber)                 | 
| |   +------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------+ 
| +NodeByLabel | 3 |  4 | p   | :Node                              | 
+--------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------+ 

Total database accesses: 25 

最後,備案,我寫的原始查詢的執行計劃(即沒有提示)是:

Compiler CYPHER 3.0 

Planner COST 

Runtime INTERPRETED 

+--------------------+----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 
| Operator   | Estimated Rows | Rows | DB Hits | Variables | Other                                        | 
+--------------------+----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 
| +ProduceResults |    2 | 2 |  0 | c, p  | p, c                                         | 
| |     +----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 
| +Filter   |    2 | 2 |  21 | c, p  | NOT(p == c) AND StartsWith(c.treeNumber,p.treeNumber) AND NOT(Contains(SubstringFunction(c.treeNumber,Add(length(p.treeNumber),{ AUTOINT0}),None),{ AUTOSTRING1})) | 
| |     +----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 
| +CartesianProduct |    9 | 9 |  0 | p -- c |                                          | 
| |\     +----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 
| | +NodeByLabelScan |    3 | 9 |  12 | c   | :Node                                        | 
| |     +----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 
| +NodeByLabelScan |    3 | 3 |  4 | p   | :Node                                        | 
+--------------------+----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 

Total database accesses: 37 

這不是最糟糕的一個:沒有提示,但與預先計算的前綴是!這就是爲什麼你應該總是測量。