2015-10-04 34 views
1

我是neo4j和密碼查詢語言的新手。奇怪的neo4j密碼行爲

我的節點/關係數據集基本上看起來像下面這樣:

  1. 我在數據庫中有大約27000用戶節點
  2. 我已經在數據庫中約8000問節點
  3. 問題的節點都可以回答(用戶) - [:ANSWERED] - >(問題)
  4. 某些Question節點爲用戶觸發屬性,因此存在像(用戶) - [:HAS_PROPERTY] - >(Property)
  5. 此外,一些Question節點需要一些屬性才能夠得到回答。所以有關係像(問題) - [:REQUIRES] - >(Property)

現在我的查詢全部是關於查找特定用戶尚未回答的問題, 50個問題。

hassling了一段時間後,我想出了以下查詢:

MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question) 
OPTIONAL MATCH (:User {code: 'xyz'})-[a:ANSWERED]->(q) 
WITH q, user 
WHERE a IS NULL 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property) 
WITH q, user, count(r) as rCount 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user) 
WITH q, rCount, count(h) as hCount 
WHERE rCount = 0 or rCount = hCount 
RETURN q ORDER BY q.priority DESC LIMIT 50 

上述查詢給我行預期,是相當快(約150毫秒),這是真棒。

什麼我不明白的是:

當我替換爲用戶變量查詢的第二行,而不是做一個標籤查找查詢變得非常緩慢。尤其對於已經回答了很多甚至所有問題的用戶。

所以下面的查詢是慢了許多:

MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question) 
OPTIONAL MATCH (user)-[a:ANSWERED]->(q) 
WITH q, user 
WHERE a IS NULL 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property) 
WITH q, user, count(r) as rCount 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user) 
WITH q, rCount, count(h) as hCount 
WHERE rCount = 0 or rCount = hCount 
RETURN q ORDER BY q.priority DESC LIMIT 50 

爲什麼是這樣的話,因爲我真的不明白嗎?事實上,我認爲查詢會更便宜,重新使用已經匹配的用戶作爲第二個可選匹配的基礎。

在對密碼性能進行研究的同時,我發現很多文章告訴我應儘量避免可選匹配。所以我的第一個查詢看起來像下面這樣:

MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question) 
MATCH (q) WHERE NOT (q)<-[:ANSWERED]->(user) 
WITH q, user 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property) 
WITH q, user, count(r) as rCount 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user) 
WITH q, rCount, count(h) as hCount 
WHERE rCount = 0 or rCount = hCount 
RETURN q ORDER BY q.priority DESC LIMIT 50 

這裏同樣的問題。上面的查詢比第一個慢很多。慢20-30倍左右。

最後,我想問問我是否缺少一些東西,是否還有更好的方法來實現我的目標。

任何幫助,將不勝感激。

問候,

亞歷

編輯

下面是一些分析詳細信息:

使用下面的查詢:

MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question) 
OPTIONAL MATCH (:User {code: 'xyz'})-[a:ANSWERED]->(q) 
WITH q, user 
WHERE a IS NULL 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property) 
WITH q, user, count(r) as rCount 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user) 
WITH q, rCount, count(h) as hCount 
WHERE rCount = 0 or rCount = hCount 
RETURN q ORDER BY q.priority DESC LIMIT 50 

Cypher version: CYPHER 2.2, planner: COST. 26979 total db hits in 169 ms. 

使用從邁克爾飢餓所述建議查詢:

MATCH (user:User {code: 'abc'}) 
MATCH (:ActiveQuestions)-[]->(q:Question) 
WHERE NOT (user)-[:ANSWERED]->(q) 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property) 
WITH q, user, count(r) as rCount 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user) 
WITH q, rCount, count(h) as hCount 
WHERE rCount = 0 or rCount = hCount 
RETURN q ORDER BY q.priority DESC LIMIT 50 

Cypher version: CYPHER 2.2, planner: COST. 2337573 total db hits in 2622 ms. 

所以我當前的查詢速度更快,效率更高。

我真的不明白,爲什麼我題爲郵報「奇怪的Neo4j暗號行爲」的事實,當我修改我還挺快的查詢從第二行:

OPTIONAL MATCH (:User {code: 'xyz'})-[a:ANSWERED]->(q) 

到:

OPTIONAL MATCH (user)-[a:ANSWERED]->(q) 

這將是有點簡單,邏輯我,我得到如下:

MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question) 
WHERE NOT (user)-[:ANSWERED]->(q) 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property) 
WITH q, user, count(r) as rCount 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user) 
WITH q, rCount, count(h) as hCount 
WHERE rCount = 0 or rCount = hCount 
RETURN q ORDER BY q.priority DESC LIMIT 50 

Cypher version: CYPHER 2.2, planner: COST. 2337573 total db hits in 2391 ms. 

因此,與之前提到的慢速查詢相比,我可以獲得完全相同的數據庫訪問量。

有沒有人對此有過解釋?

而且它沒有任何區別,當我修改第一行

來自:

MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question) 

到:

MATCH (user:User {code: 'xyz'}) 
MATCH (:ActiveQuestions)-[]->(q:Question) 

所以我基本上有兩個問題:

  1. 爲什麼重複使用已定義的用戶節點變量(用戶)比使用(user:User {code: 'xyz'})

  2. 重複使用查詢要慢得多我的第二行使用的是外部連接的準等價物。根據我提出的所有建議,這比使用MATCH (q) WHERE NOT (q)<-[:ANSWERED]->(user)要快得多,我認爲後者也在做一個外連接,但似乎並非如此。

    編輯

一些進一步的分析我想出了一個便宜一點查詢後。看下面的分析詳細信息:

使用下面的暗號查詢:

MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q) 
OPTIONAL MATCH (:User {code: 'xyz'})-[a:ANSWERED]->(q) 
WITH q, user 
WHERE a IS NULL 
OPTIONAL MATCH (q)-[r:REQUIRES]->(p) 
WITH q, user, count(r) as rCount 
OPTIONAL MATCH (q)-[r:REQUIRES]->(p)<-[h:HAS_PROPERTY]-(user) 
WITH q, rCount, count(h) as hCount 
WHERE rCount = hCount 
RETURN q ORDER BY q.priority DESC LIMIT 50 

Cypher version: CYPHER 2.2, planner: COST. 21669 total db hits in 120 ms. 

所以我基本上擺脫了明確的節點標籤(:問題)和(:房產)的例子中,這聽起來邏輯對我來說因爲不再需要明確的標籤掃描。這爲我節省了大約5300個數據庫點擊量。

還有什麼可以在此查詢上進行調整?

回答

1

您用第二個匹配遍歷了很多行,您必須再次摺疊,因此如果您將第一個WITH更改爲with distinct q, user或聚合with q,user, count(*) as answers。然後你再次降低你的基數。

而且這已經跨越了很多行的,我認爲(:ActiveQuestions)-[]->(q:Question)

如果您有譜運行查詢,你應該看到有多少數據被訪問。

一般來說,我會嘗試將您的OPTIONAL MATCH更改爲WHERE條件並查看它是如何發生的。

Btw。您可以將活動問題標記爲:ActiveQuestion,不需要額外的關係。我還添加了一個rel-type。

MATCH (user:User {code: 'xyz'}) 
MATCH (:ActiveQuestions)-[:IS_ACTIVE]->(q:Question) 
WHERE NOT (user)-[:ANSWERED]->(q) 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property) 
WITH q, user, count(r) as rCount 
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user) 
WITH q, rCount, count(h) as hCount 
WHERE rCount = 0 or rCount = hCount 
RETURN q ORDER BY q.priority DESC LIMIT 50 
+0

您好邁克爾,我已經嘗試使用WHERE NOT子句,正如我在我的第一篇文章中提到的,實際上它正在查詢並使其慢大約20倍。在引導我對我的第一篇文章的第一個查詢之前,我做了很多分析,這是我發現的最快的。我會發布一些關於分析信息的細節。 – n3bul4