我是neo4j和密碼查詢語言的新手。奇怪的neo4j密碼行爲
我的節點/關係數據集基本上看起來像下面這樣:
- 我在數據庫中有大約27000用戶節點
- 我已經在數據庫中約8000問節點
- 問題的節點都可以回答(用戶) - [:ANSWERED] - >(問題)
- 某些Question節點爲用戶觸發屬性,因此存在像(用戶) - [:HAS_PROPERTY] - >(Property)
- 此外,一些Question節點需要一些屬性才能夠得到回答。所以有關係像(問題) - [:REQUIRES] - >(Property)
現在我的查詢全部是關於查找特定用戶尚未回答的問題, 50個問題。
hassling了一段時間後,我想出了以下查詢:
MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question)
OPTIONAL MATCH (:User {code: 'xyz'})-[a:ANSWERED]->(q)
WITH q, user
WHERE a IS NULL
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)
WITH q, user, count(r) as rCount
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user)
WITH q, rCount, count(h) as hCount
WHERE rCount = 0 or rCount = hCount
RETURN q ORDER BY q.priority DESC LIMIT 50
上述查詢給我行預期,是相當快(約150毫秒),這是真棒。
什麼我不明白的是:
當我替換爲用戶變量查詢的第二行,而不是做一個標籤查找查詢變得非常緩慢。尤其對於已經回答了很多甚至所有問題的用戶。
所以下面的查詢是慢了許多:
MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question)
OPTIONAL MATCH (user)-[a:ANSWERED]->(q)
WITH q, user
WHERE a IS NULL
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)
WITH q, user, count(r) as rCount
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user)
WITH q, rCount, count(h) as hCount
WHERE rCount = 0 or rCount = hCount
RETURN q ORDER BY q.priority DESC LIMIT 50
爲什麼是這樣的話,因爲我真的不明白嗎?事實上,我認爲查詢會更便宜,重新使用已經匹配的用戶作爲第二個可選匹配的基礎。
在對密碼性能進行研究的同時,我發現很多文章告訴我應儘量避免可選匹配。所以我的第一個查詢看起來像下面這樣:
MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question)
MATCH (q) WHERE NOT (q)<-[:ANSWERED]->(user)
WITH q, user
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)
WITH q, user, count(r) as rCount
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user)
WITH q, rCount, count(h) as hCount
WHERE rCount = 0 or rCount = hCount
RETURN q ORDER BY q.priority DESC LIMIT 50
這裏同樣的問題。上面的查詢比第一個慢很多。慢20-30倍左右。
最後,我想問問我是否缺少一些東西,是否還有更好的方法來實現我的目標。
任何幫助,將不勝感激。
問候,
亞歷
編輯
下面是一些分析詳細信息:
使用下面的查詢:
MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question)
OPTIONAL MATCH (:User {code: 'xyz'})-[a:ANSWERED]->(q)
WITH q, user
WHERE a IS NULL
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)
WITH q, user, count(r) as rCount
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user)
WITH q, rCount, count(h) as hCount
WHERE rCount = 0 or rCount = hCount
RETURN q ORDER BY q.priority DESC LIMIT 50
Cypher version: CYPHER 2.2, planner: COST. 26979 total db hits in 169 ms.
使用從邁克爾飢餓所述建議查詢:
MATCH (user:User {code: 'abc'})
MATCH (:ActiveQuestions)-[]->(q:Question)
WHERE NOT (user)-[:ANSWERED]->(q)
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)
WITH q, user, count(r) as rCount
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user)
WITH q, rCount, count(h) as hCount
WHERE rCount = 0 or rCount = hCount
RETURN q ORDER BY q.priority DESC LIMIT 50
Cypher version: CYPHER 2.2, planner: COST. 2337573 total db hits in 2622 ms.
所以我當前的查詢速度更快,效率更高。
我真的不明白,爲什麼我題爲郵報「奇怪的Neo4j暗號行爲」的事實,當我修改我還挺快的查詢從第二行:
OPTIONAL MATCH (:User {code: 'xyz'})-[a:ANSWERED]->(q)
到:
OPTIONAL MATCH (user)-[a:ANSWERED]->(q)
這將是有點簡單,邏輯我,我得到如下:
MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question)
WHERE NOT (user)-[:ANSWERED]->(q)
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)
WITH q, user, count(r) as rCount
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user)
WITH q, rCount, count(h) as hCount
WHERE rCount = 0 or rCount = hCount
RETURN q ORDER BY q.priority DESC LIMIT 50
Cypher version: CYPHER 2.2, planner: COST. 2337573 total db hits in 2391 ms.
因此,與之前提到的慢速查詢相比,我可以獲得完全相同的數據庫訪問量。
有沒有人對此有過解釋?
而且它沒有任何區別,當我修改第一行
來自:
MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question)
到:
MATCH (user:User {code: 'xyz'})
MATCH (:ActiveQuestions)-[]->(q:Question)
所以我基本上有兩個問題:
爲什麼重複使用已定義的用戶節點變量(用戶)比使用
(user:User {code: 'xyz'})
重複使用查詢要慢得多我的第二行使用的是外部連接的準等價物。根據我提出的所有建議,這比使用
MATCH (q) WHERE NOT (q)<-[:ANSWERED]->(user)
要快得多,我認爲後者也在做一個外連接,但似乎並非如此。編輯
一些進一步的分析我想出了一個便宜一點查詢後。看下面的分析詳細信息:
使用下面的暗號查詢:
MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q)
OPTIONAL MATCH (:User {code: 'xyz'})-[a:ANSWERED]->(q)
WITH q, user
WHERE a IS NULL
OPTIONAL MATCH (q)-[r:REQUIRES]->(p)
WITH q, user, count(r) as rCount
OPTIONAL MATCH (q)-[r:REQUIRES]->(p)<-[h:HAS_PROPERTY]-(user)
WITH q, rCount, count(h) as hCount
WHERE rCount = hCount
RETURN q ORDER BY q.priority DESC LIMIT 50
Cypher version: CYPHER 2.2, planner: COST. 21669 total db hits in 120 ms.
所以我基本上擺脫了明確的節點標籤(:問題)和(:房產)的例子中,這聽起來邏輯對我來說因爲不再需要明確的標籤掃描。這爲我節省了大約5300個數據庫點擊量。
還有什麼可以在此查詢上進行調整?
您好邁克爾,我已經嘗試使用WHERE NOT子句,正如我在我的第一篇文章中提到的,實際上它正在查詢並使其慢大約20倍。在引導我對我的第一篇文章的第一個查詢之前,我做了很多分析,這是我發現的最快的。我會發布一些關於分析信息的細節。 – n3bul4