2015-01-21 30 views
3

我observere這兩個AQL語句之間的一個DB一套巨大的運行時間相差約20神達記錄:查詢規則對非索引屬性過濾

FOR e IN EAll 
    FILTER e.lastname == "Kmp" // <-- skip-index 
    FILTER e.lastpaff != ""  // <-- no index 
RETURN e 
// runs in less than a second 

FOR e IN EAll 
    FILTER e.lastpaff != ""  // <-- no index 
    FILTER e.lastname == "Kmp" // <-- skip-index 
RETURN e 
// needs about a minute to execute. 

除是否索引,這些語句的選擇性是非常不同的:indexedAttribute具有高度選擇性,因爲nonIndexedAttribute僅過濾50%。

有沒有可能還沒有優化規則呢?我目前正在使用ArangoDB 2.4.0。

詳情:

有一個SKIP指數對索引屬性(這似乎在execuation計劃1中使用)。 這裏有execuation計劃,其中僅過濾器的順序改變:

FAST QUERY: 

    arangosh [Uni]> stmt.explain() 
    { 
     "plan" : { 
     "nodes" : [ 
      { 
      "type" : "SingletonNode", 
      "dependencies" : [ ], 
      "id" : 1, 
      "estimatedCost" : 1, 
      "estimatedNrItems" : 1 
      }, 
      { 
      "type" : "IndexRangeNode", 
      "dependencies" : [ 
       1 
      ], 
      "id" : 8, 
      "estimatedCost" : 170463.32, 
      "estimatedNrItems" : 170462, 
      "database" : "Uni", 
      "collection" : "EAll", 
      "outVariable" : { 
       "id" : 0, 
       "name" : "i" 
      }, 
      "ranges" : [ 
       [ 
       { 
        "variable" : "i", 
        "attr" : "lastname", 
        "lowConst" : { 
        "bound" : "Kmp", 
        "include" : true, 
        "isConstant" : true 
        }, 
        "highConst" : { 
        "bound" : "Kmp", 
        "include" : true, 
        "isConstant" : true 
        }, 
        "lows" : [ ], 
        "highs" : [ ], 
        "valid" : true, 
        "equality" : true 
       } 
       ] 
      ], 
      "index" : { 
       "type" : "skiplist", 
       "id" : "13295598550318", 
       "unique" : false, 
       "fields" : [ 
       "lastname" 
       ] 
      }, 
      "reverse" : false 
      }, 
      { 
      "type" : "CalculationNode", 
      "dependencies" : [ 
       8 
      ], 
      "id" : 5, 
      "estimatedCost" : 340925.32, 
      "estimatedNrItems" : 170462, 
      "expression" : { 
       "type" : "compare !=", 
       "subNodes" : [ 
       { 
        "type" : "attribute access", 
        "name" : "lastpaff", 
        "subNodes" : [ 
        { 
         "type" : "reference", 
         "name" : "i", 
         "id" : 0 
        } 
        ] 
       }, 
       { 
        "type" : "value", 
        "value" : "" 
       } 
       ] 
      }, 
      "outVariable" : { 
       "id" : 2, 
       "name" : "2" 
      }, 
      "canThrow" : false 
      }, 
      { 
      "type" : "FilterNode", 
      "dependencies" : [ 
       5 
      ], 
      "id" : 6, 
      "estimatedCost" : 511387.32, 
      "estimatedNrItems" : 170462, 
      "inVariable" : { 
       "id" : 2, 
       "name" : "2" 
      } 
      }, 
      { 
      "type" : "ReturnNode", 
      "dependencies" : [ 
       6 
      ], 
      "id" : 7, 
      "estimatedCost" : 681849.3200000001, 
      "estimatedNrItems" : 170462, 
      "inVariable" : { 
       "id" : 0, 
       "name" : "i" 
      } 
      } 
     ], 
     "rules" : [ 
      "move-calculations-up", 
      "move-filters-up", 
      "move-calculations-up-2", 
      "move-filters-up-2", 
      "use-index-range", 
      "remove-filter-covered-by-index" 
     ], 
     "collections" : [ 
      { 
      "name" : "EAll", 
      "type" : "read" 
      } 
     ], 
     "variables" : [ 
      { 
      "id" : 0, 
      "name" : "i" 
      }, 
      { 
      "id" : 1, 
      "name" : "1" 
      }, 
      { 
      "id" : 2, 
      "name" : "2" 
      } 
     ], 
     "estimatedCost" : 681849.3200000001, 
     "estimatedNrItems" : 170462 
     }, 
     "warnings" : [ ], 
     "stats" : { 
     "rulesExecuted" : 19, 
     "rulesSkipped" : 0, 
     "plansCreated" : 1 
     } 
    } 



    SLOW Query: 

    arangosh [Uni]> stmt.explain() 
    { 
     "plan" : { 
     "nodes" : [ 
      { 
      "type" : "SingletonNode", 
      "dependencies" : [ ], 
      "id" : 1, 
      "estimatedCost" : 1, 
      "estimatedNrItems" : 1 
      }, 
      { 
      "type" : "EnumerateCollectionNode", 
      "dependencies" : [ 
       1 
      ], 
      "id" : 2, 
      "estimatedCost" : 17046233, 
      "estimatedNrItems" : 17046232, 
      "database" : "Uni", 
      "collection" : "EAll", 
      "outVariable" : { 
       "id" : 0, 
       "name" : "i" 
      }, 
      "random" : false 
      }, 
      { 
      "type" : "CalculationNode", 
      "dependencies" : [ 
       2 
      ], 
      "id" : 3, 
      "estimatedCost" : 34092465, 
      "estimatedNrItems" : 17046232, 
      "expression" : { 
       "type" : "compare !=", 
       "subNodes" : [ 
       { 
        "type" : "attribute access", 
        "name" : "lastpaff", 
        "subNodes" : [ 
        { 
         "type" : "reference", 
         "name" : "i", 
         "id" : 0 
        } 
        ] 
       }, 
       { 
        "type" : "value", 
        "value" : "" 
       } 
       ] 
      }, 
      "outVariable" : { 
       "id" : 1, 
       "name" : "1" 
      }, 
      "canThrow" : false 
      }, 
      { 
      "type" : "FilterNode", 
      "dependencies" : [ 
       3 
      ], 
      "id" : 4, 
      "estimatedCost" : 51138697, 
      "estimatedNrItems" : 17046232, 
      "inVariable" : { 
       "id" : 1, 
       "name" : "1" 
      } 
      }, 
      { 
      "type" : "CalculationNode", 
      "dependencies" : [ 
       4 
      ], 
      "id" : 5, 
      "estimatedCost" : 68184929, 
      "estimatedNrItems" : 17046232, 
      "expression" : { 
       "type" : "compare ==", 
       "subNodes" : [ 
       { 
        "type" : "attribute access", 
        "name" : "lastname", 
        "subNodes" : [ 
        { 
         "type" : "reference", 
         "name" : "i", 
         "id" : 0 
        } 
        ] 
       }, 
       { 
        "type" : "value", 
        "value" : "Kmp" 
       } 
       ] 
      }, 
      "outVariable" : { 
       "id" : 2, 
       "name" : "2" 
      }, 
      "canThrow" : false 
      }, 
      { 
      "type" : "FilterNode", 
      "dependencies" : [ 
       5 
      ], 
      "id" : 6, 
      "estimatedCost" : 85231161, 
      "estimatedNrItems" : 17046232, 
      "inVariable" : { 
       "id" : 2, 
       "name" : "2" 
      } 
      }, 
      { 
      "type" : "ReturnNode", 
      "dependencies" : [ 
       6 
      ], 
      "id" : 7, 
      "estimatedCost" : 102277393, 
      "estimatedNrItems" : 17046232, 
      "inVariable" : { 
       "id" : 0, 
       "name" : "i" 
      } 
      } 
     ], 
     "rules" : [ 
      "move-calculations-up", 
      "move-filters-up", 
      "move-calculations-up-2", 
      "move-filters-up-2" 
     ], 
     "collections" : [ 
      { 
      "name" : "EAll", 
      "type" : "read" 
      } 
     ], 
     "variables" : [ 
      { 
      "id" : 0, 
      "name" : "i" 
      }, 
      { 
      "id" : 1, 
      "name" : "1" 
      }, 
      { 
      "id" : 2, 
      "name" : "2" 
      } 
     ], 
     "estimatedCost" : 102277393, 
     "estimatedNrItems" : 17046232 
     }, 
     "warnings" : [ ], 
     "stats" : { 
     "rulesExecuted" : 19, 
     "rulesSkipped" : 0, 
     "plansCreated" : 1 
     } 
    } 
+0

我很努力地重現2.4.0中的問題。我已經嘗試了上述查詢以及在同一個「FILTER」中將兩個條件進行AND組合的變體。你能否提供爲集合創建的索引類型,以及上述兩個查詢的執行計劃?至少有趣的部分,即他們是否使用索引。這將有所幫助。 – stj 2015-01-21 14:32:22

+0

無法重現2.4.1中的問題。它可能取決於索引定義。有關這個或執行計劃的更多信息將有所幫助。 – stj 2015-01-21 14:44:41

+0

我已經更新瞭解釋,並插入執行計劃... – 2015-01-22 07:36:52

回答

1

事實上,類似的條件下禁用索引的使用,即使可以使用索引:

FILTER doc.indexedAttribute != ... FILTER doc.indexedAttribute == ... 

當兩個條件被放入同一FILTER條件並結合&&有趣的是使用索引:

FILTER doc.indexedAttribute != ... && doc.indexedAttribute == ... 

儘管這兩個語句是相同的,但它們會觸發稍微不同的代碼路徑。前者將與兩個現有的FILTER範圍進行「與」組合,後者將產生範圍從單個FILTER。對於FILTER範圍的AND組合的情況過於防禦,即使只有一個方面(在這種情況下是具有不相等運算符的方面)不能用於索引掃描,也會拒絕雙方。

這已在2.4中修復,修復將包含在2.4.2中。目前的解決方法是將兩個FILTER語句合併爲一個。

+0

聽起來非常好!我期待2.4.2,並再次感謝您在這裏的完美幫助! – 2015-01-23 11:03:34

+0

我還組織了一個解釋機制,以更緊湊的格式顯示執行計劃。我希望這會使分析未來的查詢變得更容易一些:http://jsteemann.github.io/blog/2015/01/23/explaining-aql-queries-the-fancy-way/ – stj 2015-01-23 12:32:50