2017-01-04 59 views
0

我爲25個研討會中的每一個創建了一個節點,併爲70個客戶中的每一個創建了一個節點。尋找具有共同興趣的前n名「朋友」

研討會一個月沒有特定的順序發生多次,每個研討會一次只有5個客戶,可能是70箇中的任何一個。我目前正在捕獲研討會的每一次發生以及誰參加:

MATCH (c1:Client {id: cid}), ..., (c5:Client {id: cid}), (s:Seminar {id: sid}) 
WITH c1, c2, c3, c4, c5 
CREATE UNIQUE (c1)-[:ATTENDED {event_id: eid}]->(s) 
... 
CREATE UNIQUE (c5)-[:ATTENDED {event_id: eid}]->(s) 
WITH c1, c2, c3, c4, c5, s 
MERGE (c1)-[x:WITH]-(c2) 
ON MATCH SET x.count = x.count + 1 
ON CREATE SET x.count = 1 
...repeat for c1 & c3, c1 & c4, c1 & c5 
WITH c2, c3, c4, c5 
...repeat c2 & c3, c2 & c4, c2 & c5 
WITH c3, c4, c5 
...repeat for c3 & c4, c3 & c5... 
WITH c4, c5 
MERGE (c4)-[x:WITH]-(c5) 
ON MATCH SET x.count = x.count + 1 
ON CREATE SET x.count = 1; 

新事件:

(x:Seminar {event_id: xid}) 

我想「目標」已參加各種研討會,從而

(:Client)-[r:WITH]-(:Client) WHERE r.count >= 1 
012頂5個客戶

目標是收集彼此「最熟悉」的客戶。我如何編碼這個查詢?我有足夠的信息(關係和屬性)嗎?有沒有更好的方法來添加事件數據?

回答

1

我可以建議一個替代建立你的:WITH關係。

MATCH (c:Client)-[:ATTENDED]->(:Seminar)<-[:ATTENDED]-(co:Client) 
WITH c, co, COUNT(co) as timesWith 
MERGE (c)-[r:WITH]-(co) 
SET r.count = timesWith 

這可以讓你爲每個客戶端一排,他們出席了研討會的客戶,並且他們參加的研討會與他們的次數,並保存(或更新),關於你的指望:人際關係。

如果您可以將一組ID作爲參數提供給您的查詢,那麼您可以使用查詢來創建研討會以及客戶端和研討會之間的關係,因爲您可以一次完成所有操作,而不是單獨進行:

MATCH (c:Client), (s:Seminar {id: sid}) 
WHERE c.id IN {attendeeIDs} 
MERGE (c)-[:ATTENDED]->(s) 
// and then you can run the query above to update WITH relationships if necessary 

至於你想要什麼休息,這是一個相當棘手的查詢,我不知道,如果你已經清楚你的方法應該是什麼。

您是否在尋找一組5:客戶端,如:他們之間的WITH關係的總數是其他5個客戶端中最大的?由於這種查詢將要求您測試5個客戶端的每個組合,並執行該計算,因此我們還必須格外小心,以確保我們使用組合而不是排列組合。即使如此,這將是非常昂貴的查詢,因爲70個可能性中有5個的組合的數目是C(70,5)= 12,103,014。這是很多行建立起來的,並且在每一行上運行的操作。

// first match on a combination of 5; id inequalities prevent permutations 
MATCH (c1:Client), (c2:Client), (c3:Client), (c4:Client), (c5:Client) 
WHERE id(c1) < id(c2) < id(c3) < id(c4) < id(c5) 
WITH c1, c2, c3, c4, c5, [id(c1),id(c2),id(c3),id(c4),id(c5)] as ids 
// find all possible :WITH relationships between each set of 5 
OPTIONAL MATCH (a)-[r:WITH]-(b) 
WHERE id(a) in ids AND id(b) in ids 
WITH c1,c2,c3,c4,c5, SUM(r.count) as togetherness 
ORDER BY togetherness DESC 
RETURN c1,c2,c3,c4,c5 
LIMIT 1 

有辦法可以使這個更高效。而不是看所有:客戶,您可能會首先嚐試獲得前n名:基於研討會的客戶參加,然後嘗試運行類似的查詢。

這是,如果你選擇了前15名參加者儘可能參加研討會首先它會是什麼樣子,然後試圖找到該組的5已經在一起了那些15最:

MATCH (c:Client) 
WITH c, SIZE((c)-[:ATTENDED]->(:Seminar)) as attendance 
ORDER BY attendance DESC 
WITH c 
LIMIT 15 
WITH COLLECT(id(c)) as ids 
// first match on a combination of 5; id inequalities prevent permutations 
MATCH (c1:Client), (c2:Client), (c3:Client), (c4:Client), (c5:Client) 
WHERE id(c1) in ids, id(c2) in ids, id(c3) in ids, id(c4) in ids, id(c5) in ids 
AND id(c1) < id(c2) < id(c3) < id(c4) < id(c5) 
WITH c1, c2, c3, c4, c5, [id(c1),id(c2),id(c3),id(c4),id(c5)] as ids 
// find all possible :WITH relationships between each set of 5 
OPTIONAL MATCH (a)-[r:WITH]-(b) 
WHERE id(a) in ids AND id(b) in ids 
WITH c1,c2,c3,c4,c5, SUM(r.count) as togetherness 
ORDER BY togetherness DESC 
RETURN c1,c2,c3,c4,c5 
LIMIT 1