Neo4j用於搜索的文檔，關鍵詞和詞幹的數據模型

我的目標是使用neo4j對文檔進行兩種不同類型的搜索。我將爲我的示例使用食譜（文檔）。說我手邊有一份配料（關鍵詞）（牛奶，黃油，麪粉，鹽，糖，雞蛋......），我在我的數據庫中有一些配方，每個配方都附有配料。我想輸入我的列表並得到兩個不同的結果。一個是最接近包括我輸入的所有成分的食譜。第二個是食譜的組合，一起包括我的所有成分。Neo4j用於搜索的文檔，關鍵詞和詞幹的數據模型

考慮：牛奶，黃油，麪粉，鹽，糖，雞蛋

對於第一種情況的檢索結果可能是：

1）糖餅乾

2）黃油餅乾

一種用於所述第二結果可能是：

1）扁平面包和Gogel-Mogel

我正在閱讀食譜中插入neo4j，並從每個配方頂部的成分列表中提取成分，但也從配方說明中提取。我想要權衡這些不同，也許60/40贊成成分列表。

我也想幹每種成分，以防人們輸入類似的詞。

我努力想出一個在neo4j中的好數據模型。我計劃讓用戶輸入英文成分，我會在後臺阻止它們，並將其用於搜索。

我的第一個想法是： neo4j data model 1 這對我來說很直觀，但是要找到所有食譜需要大量的時間。

下一頁也許這樣的： neo4j data model 2

它得到直接的食譜從莖，但我需要通過配方IDS的關係（？右）獲得實際的成分。

第三，也許這樣結合他們？ neo4j data model 3 但有很多重複。

這裏也有一些CYPHER語句來創建第一個想法：

//Create 4 recipes 
create (r1:Recipe {rid:'1', title:'Sugar cookies'}), (r2:Recipe {rid:'2', title:'Butter cookies'}), 
(r3:Recipe {rid:'3', title:'Flat bread'}), (r4:Recipe {rid:'4', title:'Gogel-Mogel'}) 

//Adding some ingredients 
merge (i1:Ingredient {ingredient:"salted butter"}) 
merge (i2:Ingredient {ingredient:"white sugar"}) 
merge (i3:Ingredient {ingredient:"brown sugar"}) 
merge (i4:Ingredient {ingredient:"all purpose flour"}) 
merge (i5:Ingredient {ingredient:"iodized salt"}) 
merge (i6:Ingredient {ingredient:"eggs"}) 
merge (i7:Ingredient {ingredient:"milk"}) 
merge (i8:Ingredient {ingredient:"powdered sugar"}) 
merge (i9:Ingredient {ingredient:"wheat flour"}) 
merge (i10:Ingredient {ingredient:"bananas"}) 
merge (i11:Ingredient {ingredient:"chocolate chips"}) 
merge (i12:Ingredient {ingredient:"raisins"}) 
merge (i13:Ingredient {ingredient:"unsalted butter"}) 
merge (i14:Ingredient {ingredient:"wheat flour"}) 
merge (i15:Ingredient {ingredient:"himalayan salt"}) 
merge (i16:Ingredient {ingredient:"chocolate bars"}) 
merge (i17:Ingredient {ingredient:"vanilla flavoring"}) 
merge (i18:Ingredient {ingredient:"vanilla"}) 

//Stems added to each ingredient 
merge (i1)<-[:STEM_OF]-(s1:Stem {stem:"butter"}) 
merge (i2)<-[:STEM_OF]-(s2:Stem {stem:"sugar"}) 
merge (i3)<-[:STEM_OF]-(s2) 
merge (i4)<-[:STEM_OF]-(s4:Stem {stem:"flour"}) 
merge (i5)<-[:STEM_OF]-(s5:Stem {stem:"salt"}) 
merge (i6)<-[:STEM_OF]-(s6:Stem {stem:"egg"}) 
merge (i7)<-[:STEM_OF]-(s7:Stem {stem:"milk"}) 
merge (i8)<-[:STEM_OF]-(s2) 
merge (i9)<-[:STEM_OF]-(s4) 
merge (i10)<-[:STEM_OF]-(s10:Stem {stem:"banana"}) 

merge (i11)<-[:STEM_OF]-(s11:Stem {stem:"chocolate"}) 
merge (i12)<-[:STEM_OF]-(s12:Stem {stem:"raisin"}) 
merge (i13)<-[:STEM_OF]-(s1) 
merge (i14)<-[:STEM_OF]-(s4) 
merge (i15)<-[:STEM_OF]-(s5) 
merge (i16)<-[:STEM_OF]-(s11) 
merge (i17)<-[:STEM_OF]-(s13:Stem {stem:"vanilla"}) 
merge (i18)<-[:STEM_OF]-(s13) 


create (r1)<-[:INGREDIENTS_LIST{weight:.7}]-(i1) 
create (r1)<-[:INGREDIENTS_LIST{weight:.6}]-(i2)  
create (r1)<-[:INGREDIENTS_LIST{weight:.5}]-(i4) 
create (r1)<-[:INGREDIENTS_LIST{weight:.4}]-(i5) 
create (r1)<-[:INGREDIENTS_LIST{weight:.4}]-(i6) 
create (r1)<-[:INGREDIENTS_LIST{weight:.2}]-(i7) 
create (r1)<-[:INGREDIENTS_LIST{weight:.1}]-(i18) 

create (r2)<-[:INGREDIENTS_LIST{weight:.7}]-(i1) 
create (r2)<-[:INGREDIENTS_LIST{weight:.6}]-(i3)  
create (r2)<-[:INGREDIENTS_LIST{weight:.5}]-(i4) 
create (r2)<-[:INGREDIENTS_LIST{weight:.4}]-(i5) 
create (r2)<-[:INGREDIENTS_LIST{weight:.3}]-(i6) 
create (r2)<-[:INGREDIENTS_LIST{weight:.2}]-(i7) 
create (r2)<-[:INGREDIENTS_LIST{weight:.1}]-(i18) 

create (r3)<-[:INGREDIENTS_LIST{weight:.7}]-(i1) 
create (r3)<-[:INGREDIENTS_LIST{weight:.6}]-(i5) 
create (r3)<-[:INGREDIENTS_LIST{weight:.5}]-(i7) 
create (r3)<-[:INGREDIENTS_LIST{weight:.4}]-(i9) 

create (r4)<-[:INGREDIENTS_LIST{weight:.6}]-(i2) 
create (r4)<-[:INGREDIENTS_LIST{weight:.5}]-(i6) 



create (r1)<-[:INGREDIENTS_INSTR{weight:.2}]-(i1) 
create (r1)<-[:INGREDIENTS_INSTR{weight:.2}]-(i2) 
create (r1)<-[:INGREDIENTS_INSTR{weight:.2}]-(i4) 
create (r1)<-[:INGREDIENTS_INSTR{weight:.2}]-(i5) 
create (r1)<-[:INGREDIENTS_INSTR{weight:.1}]-(i6) 
create (r1)<-[:INGREDIENTS_INSTR{weight:.1}]-(i7) 


create (r2)<-[:INGREDIENTS_INSTR{weight:.3}]-(i1) 
create (r2)<-[:INGREDIENTS_INSTR{weight:.2}]-(i3) 
create (r2)<-[:INGREDIENTS_INSTR{weight:.2}]-(i4) 
create (r2)<-[:INGREDIENTS_INSTR{weight:.2}]-(i5) 
create (r2)<-[:INGREDIENTS_INSTR{weight:.2}]-(i6) 
create (r2)<-[:INGREDIENTS_INSTR{weight:.1}]-(i7) 


create (r3)<-[:INGREDIENTS_INSTR{weight:.3}]-(i1) 
create (r3)<-[:INGREDIENTS_INSTR{weight:.3}]-(i5) 
create (r3)<-[:INGREDIENTS_INSTR{weight:.1}]-(i7) 
create (r3)<-[:INGREDIENTS_INSTR{weight:.1}]-(i9) 

create (r4)<-[:INGREDIENTS_INSTR{weight:.3}]-(i2) 
create (r4)<-[:INGREDIENTS_INSTR{weight:.3}]-(i6)

，並鏈接到一個Neo4j的控制檯上面的語句： http://console.neo4j.org/?id=3o8y44

多少Neo4j的關心多重關係？此外，我可以做一個單一的成分，但我怎麼會把一個查詢，讓食譜給多個配料？

編輯：謝謝邁克爾！這讓我進一步。我能夠擴大你的答案：

WITH split("egg, sugar, chocolate, milk, flour, salt",", ") as terms UNWIND 
terms as term MATCH (stem:Stem {stem:term})-[:STEM_OF]-> 
(ingredient:Ingredient)-[lst:INGREDIENTS_LIST]->(r:Recipe) WITH r, 
size(terms) - count(distinct stem) as notCovered, sum(lst.weight) as weight, 
collect(distinct stem.stem) as matched RETURN r , notCovered,matched, weight 
ORDER BY notCovered ASC, weight DESC

並得到了配料和重量的列表。我如何更改查詢以顯示：INGREDIENTS_INSTR關係的權重，以便我可以同時使用兩個權重進行計算？ [lst：INGREDIENTS_LIST | INGREDIENTS_INSTR]不是我想要的。

編輯：

這似乎是工作，是正確的嗎？

WITH split("egg, sugar, chocolate, milk, flour, salt",", ") as terms UNWIND 
terms as term MATCH (stem:Stem {stem:term})-[:STEM_OF]-> 
(ingredient:Ingredient)-[lstl:INGREDIENTS_LIST]->(r:Recipe)<- 
[lsti:INGREDIENTS_INSTR]-(ingredient:Ingredient) WITH r, size(terms) - 
count(distinct stem) as notCovered, sum(lsti.weight) as wi, sum(lstl.weight) 
as wl, collect(distinct stem.stem) as matched RETURN r , 
notCovered,matched, wl+wi ORDER BY notCovered ASC, wl+wi DESC

另外，你可以幫助第二個查詢嗎？在提供成分列表的情況下，將返回包括給定成分的食譜組合。再次感謝！

來源

2017-09-13 Oleg

我會去你的版本1）。

不要擔心額外的啤酒花。您會在配方和實際配料之間的關係中放入有關量/重量的信息。

您可以有多個關係。

下面是一個例子查詢，你有沒有配方，它有所有成分不會與數據集中工作：

WITH split("milk, butter, flour, salt, sugar, eggs",", ") as terms 
UNWIND terms as term 
MATCH (stem:Stem {stem:term})-[:STEM_OF]->(ingredient:Ingredient)-->(r:Recipe) 
WITH r, size(terms) - count(distinct stem) as notCovered 
RETURN r ORDER BY notCovered ASC LIMIT 2 

+-----------------------------------------+ 
| r          | 
+-----------------------------------------+ 
| Node[0]{rid:"1",title:"Sugar cookies"} | 
| Node[1]{rid:"2",title:"Butter cookies"} | 
+-----------------------------------------+ 
2 rows

以下將是大數據集進行優化：

而且對於查詢你會首先找到所有的成分，然後食譜附有最有選擇性的（最低程度）。

然後檢查每個食譜的其餘成分。

WITH split("milk, butter, flour, salt, sugar, eggs",", ") as terms 
MATCH (stem:Stem) WHERE stem.stem IN terms 
// highest selective stem first 
WITH stem, terms ORDER BY size((stem)-[:STEM_OF]->()) ASC 
WITH terms, collect(stem) as stems 
WITH head(stems) first, tail(stems) as rest, terms 
MATCH (first)-[:STEM_OF]->(ingredient:Ingredient)-->(r:Recipe) 
WHERE size[other IN rest WHERE (other)-[:STEM_OF]->(:Ingredient)-->(r)] as covered 
WITH r, size(terms) - 1 - covered as notCovered 
RETURN r ORDER BY notCovered ASC LIMIT 2

來源

2017-09-18 23:00:38

是否在最後缺少答案的一部分？冒號後？ – Oleg

爲Q1編輯，稍後再做Q2。 –

Neo4j用於搜索的文檔，關鍵詞和詞幹的數據模型

回答

相關問題