2016-07-19 75 views
0

我想將兩個csv文件導入Orientdb數據庫。第一個是頂點,有100萬條記錄。第二類是擁有59萬條記錄Orientdb - CSV導入 - 性能CSV導入邊緣

我有兩個JSON文件導入邊緣:

頂點

{ 
    "source": { "file": { "path": "../csvs/metodo01/pesquisador.csv" } }, 
    "extractor": { "row": {} }, 
    "transformers": [ 
    { "csv": {} }, 
    { "vertex": { "class": "Pesquisador" } } 
    ], 
    "loader": { 
    "orientdb": { 
     "dbURL": "remote:localhost/dbCemMilM01", 
     "dbType": "graph", 
     "batchCommit": 1000, 
     "classes": [ 
     {"name": "Pesquisador", "extends": "V"} 
     ], "indexes": [ 
     {"class":"Pesquisador", "fields":["psq_id:integer"], "type":"UNIQUE" } 
     ] 
    } 
    } 
} 

邊緣

{ 
    "config": { 
     "log": "info", 
      "parallel": false 
    }, 
    "source": { 
     "file": { 
      "path": "../csvs/metodo01/a10.csv" 
     } 
    }, 
    "extractor": { 
     "row": { 
     } 
    }, 
    "transformers": [{ 
     "csv": { 
      "separator": ",", 
      "columnsOnFirstLine": true, 
      "columns": ["psq_id_from:integer", 
      "pub_id_to:integer", 
      "ordem:integer"] 
     } 
    }, 
    { 
     "command": { 
      "command": "create edge PUBLICOU from (select from Pesquisador where psq_id = ${input.psq_id_from}) to (select from Publicacao where pub_id = ${input.pub_id_to}) set ordem = ${input.ordem} ", 
      "output": "edge" 
     } 
    }], 
    "loader": { 
     "orientdb": { 
      "dbURL": "remote:localhost/dbUmMilhaoM01", 
      "dbType": "graph", 
      "standardElementConstraints": false, 
      "batchCommit": 1000, 
      "classes": [{ 
       "name": "PUBLICOU", 
       "extends": "E" 
      }] 
     } 
    } 
} 

在這個過程中Orientdb建議使用索引加快進程。

我該怎麼做?

只是命令是創建邊PUBLICOU從(選擇從Pesquisador其中psq_id = $ {input.psq_id_from})到(從Publicacao選擇其中pub_id = $ {input.pub_id_to})set ordem = $ {input.ordem}

+0

你見過關於索引的官方文檔:http://orientdb.com/docs/last/Indexes.html? –

回答

0

要加快創建邊緣過程,您可能需要在屬性Pesquisador.psq_id(您已擁有)和Publicacao.pub_id上使用索引。

伊萬

0

您可以在ETL配置直接申報索引。從DBpedia的進口商採取例如:

"orientdb": { 
    "dbURL": "plocal:/temp/databases/dbpedia", 
    "dbUser": "importer", 
    "dbPassword": "IMP", 
    "dbAutoCreate": true, 
    "tx": false, 
    "batchCommit": 1000, 
    "wal" : false, 
    "dbType": "graph", 
    "classes": [ 
    {"name":"Person", "extends": "V" }, 
    {"name":"Customer", "extends": "Person", "clusters":8 } 
    ], 
    "indexes": [ 
    {"class":"V", "fields":["URI:string"], "type":"UNIQUE" }, 
    {"class":"Person", "fields":["town:string"], "type":"NOTUNIQUE" , 
     metadata : { "ignoreNullValues" : false } 
    } 
    ] 
} 

更多信息看:http://orientdb.com/docs/2.2/Loader.html

0

用來加快加載過程我的建議是在plocal模式工作,然後模式分貝獨立OrientDB服務器創建。