我試圖導入lastfm360K數據庫到數據庫Neo4j的。我先插入所有用戶節點沒有任何問題與下面的代碼掛起,而將數據導入的Neo4j與Python
import re
from datetime import datetime
from elasticsearch import Elasticsearch
import certifi
from neo4j.v1 import GraphDatabase, basic_auth
driver = GraphDatabase.driver("bolt://localhost", auth=basic_auth("neo4j", "pass"))
session = driver.session()
with open("/Users/inanc/Documents/Software/lastfm-dataset-360K/usersha1-profile.tsv" , 'r') as userFile:
#first_line = userFile.readline()
linenum = 0
for line in userFile:
linenum = linenum + 1
if linenum % 1000 == 0:
print(linenum)
lineStrip = line.rstrip().split("\t")
tempDict = {}
tempDict["user_id"] = lineStrip[0]
if len(lineStrip) > 1:
tempDict["gender"] = lineStrip[1]
if lineStrip[2] != "":
tempDict["age"] = int(lineStrip[2])
tempDict["country"] = lineStrip[3]
tempDict["signup"] = lineStrip[4]
session.run("CREATE (a:Person {dict})", {"dict": tempDict})
session.close()
然後我要添加藝術家節點和關係與用戶如下
import re
from datetime import datetime
from elasticsearch import Elasticsearch
import certifi
from neo4j.v1 import GraphDatabase, basic_auth
driver = GraphDatabase.driver("bolt://localhost", auth=basic_auth("neo4j", "pass"))
session = driver.session()
linenum = 0
with open("/Users/inanc/Documents/Software/lastfm-dataset-360K/usersha1-artmbid-artname-plays.tsv" , 'r') as songFile:
for line in songFile:
linenum = linenum + 1
if linenum % 10000 == 0:
print(linenum)
lineStrip = line.rstrip().split("\t")
if len(lineStrip) == 4:
#print(line)
user_id = lineStrip[0]
musicbrainz_artistid = lineStrip[1]
artist_name = lineStrip[2]
plays = 1
if lineStrip[3] != "":
plays = int(lineStrip[3])
session.run("MERGE (a:Artist {artist_name: {artist_name}})", {"artist_name": artist_name})
session.run("MATCH (p:Person {user_id: {user_id}}), (a:Artist {artist_name: {artist_name}}) CREATE (p)-[:LIKES {times: {plays}}]->(a)", {"user_id": user_id, "artist_name": artist_name, "plays": plays})
session.close()
它開始這樣做沒有任何錯誤(這它的速度非常慢,需要花費數小時),但是在某個時間點之後它會掛起(例如幾百萬行之後)。即使我的python腳本掛起,我仍然可以通過瀏覽器查詢。
我唯一的約束是
create constraint on (p:Person) assert p.user_id is unique;
create constraint on (a:Artist) assert a.artist_name is unique;
我使用了一臺Macbook與8GB的內存Neo4j的3.0.7。我也使用neo4j官方支持的python driver。
任何幫助將不勝感激!
Omg,你說得對!它像一個魅力一樣工作!它沒有掛起,只花了28分鐘來插入所有的節點和關係。非常感謝! –