2016-01-04 37 views
2

我是python和ArangoDB的初學者。我已經在單個收集名稱「DSP」上記錄了ArangoDB中的數據。 我的查詢是: 在格式類似如何提高在ArangoDB中檢索查詢性能2.7

{"data": [{"content": "Java", "type": "string", "name": "name", "key": 1}, {"content": "D:/Java", "type": "string", "name": "location", "key": 1}, {"content": "File folder", "type": "string", "name": "type", "key": 1}, {"content": 1896038645, "type": "int", "name": "size", "key": 1}, {"content": 7, "type": "string", "name": "child_folder_count", "key": 1}, {"content": 7, "type": "string", "name": "child_file_count", "key": 1}, {"content": "parse_dir.py", "type": "string", "name": "name", "key": 101}, {"content": "D:/Java/parse_dir.py", "type": "string", "name": "location", "key": 101}, {"content": "py", "type": "string", "name": "mime-type", "key": 101}, {"content": 4032, "type": "string", "name": "size", "key": 101}, {"content": "Wed Dec 30 21:36:32 2015", "type": "string", "name": "created_date", "key": 101}, {"content": "Wed Dec 30 21:42:38 2015", "type": "string", "name": "modified_date", "key": 101}, {"content": "result.json", "type": "string", "name": "name", "key": 102}, {"content": "D:/Java/result.json", "type": "string", "name": "location", "key": 102}, {"content": "json", "type": "string", "name": "mime-type", "key": 102}, {"content": 1134450, "type": "string", "name": "size", "key": 102}, {"content": "Wed Dec 30 21:36:45 2015", "type": "string", "name": "created_date", "key": 102}, {"content": "Wed Dec 30 21:36:45 2015", "type": "string", "name": "modified_date", "key": 102}, {"content": "rmi1.rar", "type": "string", "name": "name", "key": 103}, {"content": "D:/Java/rmi1.rar", "type": "string", "name": "location", "key": 103}, {"content": "rar", "type": "string", "name": "mime-type", "key": 103}, {"content": 165116, "type": "string", "name": "size", "key": 103}, {"content": "Sun Aug 25 07:29:52 2013", "type": "string", "name": "created_date", "key": 103}, {"content": "Tue Aug 30 16:18:34 2011", "type": "string", "name": "modified_date", "key": 103}, {"content": "servlet.rar", "type": "string", "name": "name", "key": 104}, {"content": "D:/Java/servlet.rar", "type": "string", "name": "location", "key": 104}, {"content": "rar", "type": "string", "name": "mime-type", "key": 104}, {"content": 782, "type": "string", "name": "size", "key": 104}, {"content": "Sun Aug 25 07:29:52 2013", "type": "string", "name": "created_date", "key": 104}, {"content": "Tue Aug 30 16:18:30 2011", "type": "string", "name": "modified_date", "key": 104}, {"content": "crawler projects", "type": "string", "name": "name", "key": 2}, {"content": "D:/Java/crawler projects", "type": "string", "name": "location", "key": 2}, {"content": "File folder", "type": "string", "name": "type", "key": 2}, {"content": 1886842316, "type": "int", "name": "size", "key": 2}, {"content": 5, "type": "string", "name": "child_folder_count", "key": 2}, {"content": 5, "type": "string", "name": "child_file_count", "key": 2}, {"content": ".metadata", "type": "string", "name": "name", "key": 3}, {"content": "D:/Java/crawler projects/.metadata", "type": "string", "name": "location", "key": 3}, {"content": "File folder", "type": "string", "name": "type", "key": 3}, {"content": 10131546, "type": "int", "name": "size", "key": 3}, {"content": 2, "type": "string", "name": "child_folder_count", "key": 3}, {"content": 2, "type": "string", "name": "child_file_count", "key": 3}, {"content": ".lock", "type": "string", "name": "name", "key": 301}, {"content": "D:/Java/crawler projects/.metadata/.lock", "type": "string", "name": "location", "key": 301}, {"content": "", "type": "string", "name": "mime-type", "key": 301}, {"content": 0, "type": "string", "name": "size", "key": 301}, {"content": "Sun Aug 25 07:29:52 2013", "type": "string", "name": "created_date", "key": 301}, {"content": "Mon May 30 12:21:45 2011", "type": "string", "name": "modified_date", "key": 301}, {"content": ".log", "type": "string", "name": "name", "key": 302}, {"content": "D:/Java/crawler projects/.metadata/.log", "type": "string", "name": "location", "key": 302}, {"content": "", "type": "string", "name": "mime-type", "key": 302}, {"content": 598, "type": "string", "name": "size", "key": 302}, {"content": "Sun Aug 25 07:29:52 2013", "type": "string", "name": "created_date", "key": 302}, {"content": "Mon May 30 15:29:18 2011", "type": "string", "name": "modified_date", "key": 302}, {"content": "version.ini", "type": "string", "name": "name", "key": 303}, {"content": "D:/Java/crawler projects/.metadata/version.ini", "type": "string", "name": "location", "key": 303}, {"content": "ini", "type": "string", "name": "mime-type", "key": 303}, {"content": 26, "type": "string", "name": "size", "key": 303}, {"content": "Sun Aug 25 07:29:52 2013", "type": "string", "name": "created_date", "key": 303}, {"content": "Mon May 30 15:29:18 2011", "type": "string", "name": "modified_date", "key": 303}, {"content": ".mylyn", "type": "string", "name": "name", "key": 4}, {"content": "D:/Java/crawler projects/.metadata/.mylyn", "type": "string", "name": "location", "key": 4}, {"content": "File folder", "type": "string", "name": "type", "key": 4}, {"content": 920, "type": "int", "name": "size", "key": 4}, {"content": 1, "type": "string", "name": "child_folder_count", "key": 4}, {"content": 1, "type": "string", "name": "child_file_count", "key": 4}, {"content": ".tasks.xml.zip", "type": "string", "name": "name", "key": 401}, {"content": "D:/Java/crawler projects/.metadata/.mylyn/.tasks.xml.zip", "type": "string", "name": "location", "key": 401}, {"content": "zip", "type": "string", "name": "mime-type", "key": 401}, {"content": 250, "type": "string", "name": "size", "key": 401}, {"content": "Sun Aug 25 07:29:52 2013", "type": "string", "name": "created_date", "key": 401}, {"content": "Mon May 30 12:23:18 2011", "type": "string", "name": "modified_date", "key": 401}, {"content": "repositories.xml.zip", "type": "string", "name": "name", "key": 402}, {"content": "D:/Java/crawler projects/.metadata/.mylyn/repositories.xml.zip", "type": "string", "name": "location", "key": 402}, {"content": "zip", "type": "string", "name": "mime-type", "key": 402}, {"content": 420, "type": "string", "name": "size", "key": 402}, {"content": "Sun Aug 25 07:29:52 2013", "type": "string", "name": "created_date", "key": 402}, {"content": "Mon May 30 12:23:18 2011", "type": "string", "name": "modified_date", "key": 402}, {"content": "tasks.xml.zip", "type": "string", "name": "name", "key": 403}, {"content": "D:/Java/crawler projects/.metadata/.mylyn/tasks.xml.zip", "type": "string", "name": "location", "key": 403}, {"content": "zip", "type": "string", "name": "mime-type", "key": 403}, {"content": 250, "type": "string", "name": "size", "key": 403}, {"content": "Sun Aug 25 07:29:52 2013", "type": "string", "name": "created_date", "key": 403}, {"content": "Mon May 30 15:31:16 2011", "type": "string", "name": "modified_date", "key": 403}, {"content": "contexts", "type": "string", "name": "name", "key": 5}, {"content": "D:/Java/crawler projects/.metadata/.mylyn/contexts", "type": "string", "name": "location", "key": 5}, {"content": "File folder", "type": "string", "name": "type", "key": 5}, {"content": 0, "type": "int", "name": "size", "key": 5}, {"content": 0, "type": "string", "name": "child_folder_count", "key": 5}] 

,因爲我大約增加了JSON文件的大約15 JSON文件100:

for k in 
    (for t in DSP return [t.data]) 
     for z in k 
      for p in z 
       filter p.name == "name" || 
        p.content == "pdf" || 
        p.content == "xml" || 
        p.name == "Book" 
       return p 

並且在存儲了JSON數據MB,或者添加更多n個更多的過濾器條件。查詢需要1分鐘以上的時間,有些時候瀏覽器沒有響應。

我在英特爾酷睿i3 2.4 GHz,4 GB RAM和160GB SATA硬盤上做了這個實驗。

請告訴我,首先,如何提高查詢性能?無論我需要更改存儲結構還是更改查詢的語法。以及如何對具有相同密鑰的多個文檔執行連接操作,例如,「檢索xml文檔的名稱」。

回答

3

應該有一些方法來改善查詢的性能:

  • 通過一個子查詢選擇從收集DSP所有文檔,然後循環訪問它們(for k in (for t in DSP return [t.data]) for z in k for p in z filter p.name == "name" ...)可能會比直接使用文件效率較低。嘗試更換4個FOR循環和只FOR k IN DSP FOR p IN k.data FILTER p.name == "name" ...子查詢)

  • ,如果你看一下查詢的explain輸出,它會顯示沒有索引將被使用。如果集合中有很多文檔,只想用查詢檢索其中的一些文檔,則索引將有助於提高性能。我建議在data[*].name上使用數組索引,在data[*].content上使用數組索引。你可以像這樣設置它們:db.DSP.ensureIndex({ type: "hash", fields: [ "data[*].name" ] }); db.DSP.ensureIndex({ type: "hash", fields: [ "data[*].content" ] });。注意:這些類型的索引需要使用ArangoDB 2.8。通過這些索引,查詢也可以簡化爲:FOR p in DSP FILTER "name" IN p.data[*].name || "Book" IN p.data[*].name || "pdf" IN p.data[*].content...。請注意,索引只會幫助您快速找到包含搜索數據的文檔,而不是包含它的文檔部分。

  • 調整文檔結構可能會有幫助。您當前的結構似乎每個文檔包含多個contentname值,例如, [ {"content": "Java", "type": "string", "name": "name", "key": 1}, {"content": "D:/Java", "type": "string", "name": "location", "key": 1} ]。它看起來像每個文件只有一個data屬性這是一個數組這些結構。您可以嘗試將每個數組值保存爲單獨的文檔,而不是使用此結構。例如,{"content": "Java", "type": "string", "name": "name", "key": 1}將成爲它自己的文檔,{"content": "D:/Java", "type": "string", "name": "location", "key": 1}將成爲另一個文檔等。這似乎是明智的,因爲您的子結構似乎已經具有key屬性,並且幾個數組值似乎指的是相同的key值。這種轉換將允許將潛在的非常大的文檔分割成更小的塊,這不僅會使AQL運行得更快(因爲在訪問文檔時需要解壓縮更少的數據),而且還可以讓您擺脫所有嵌套循環並在返回結果時定位到相關的內部數組值。

你應該調整文檔結構,查詢然後可以大大簡化爲FOR p IN DSP FILTER "name" IN p.data[*].name || "Book" IN p.data[*].name || "pdf" IN p.data[*].content ... RETURN p,應該是快,如果使用索引。

+0

謝謝,stj ...我在2。7版本的ArangoDB,它不允許在理智的集合上創建多個哈希索引。當我已經使用FOR IN K IN DSP中的查詢進行p IN k.data filter p.name ==「modified_date」|| .type ==「string」返回p格式,時間減半 –