2017-07-25 17 views
0

我有一個包含'曲線'的大型SQL數據庫。每條曲線都有一個ID(曲線ID)。我試圖確定每條曲線的主要用戶,以及是否使用它。爲了實現這一點,DBA提供了對數據庫執行的所有語句的日誌。解析T-SQL來提取WHERE子句的一部分

這些陳述非常複雜。我想要做的是提取正在查詢的曲線。

示例陳述如下:

WITH G AS (SELECT [Timevalue] FROM [mc].[GranularityLookup] 
WHERE [TimeValue] BETWEEN '19-Jul-2017 00:00' AND '30-Sep-2017 00:00' 
AND [1 Hr] = 1), 
D AS (SELECT [CurveID], [DeliveryDate], [PublishDate], AVG([Value]) Value, MAX([PeriodNumber]) PeriodNumber 
FROM mc.CURVEID_6657_1_LATEST data 
JOIN 
(SELECT CurveID ID, DeliveryDate dDate, MAX(PublishDate) pDate 
FROM mc.CURVEID_6657_1_LATEST 
WHERE CurveID = 90564 
    AND DeliveryDate >= '19-Jul-2017 00:00' AND DeliveryDate <= '30-Sep-2017 00:00' 
GROUP BY DeliveryDate, CurveID) Dates 
ON data.DeliveryDate = dates.dDate AND data.PublishDate = dates.pDate 
WHERE data.CurveID = 90564 
AND data.DeliveryDate >= '19-Jul-2017 00:00' AND data.DeliveryDate <= '30-Sep-2017 00:00' 
GROUP BY [CurveID], [PublishDate], [DeliveryDate]) 
SELECT 
G.[TimeValue] [Deliver 
yDate] , D.[PublishDate], D.[Value], D.[PeriodNumber] 
FROM 
G 
LEFT JOIN 
D 
ON 
G.[TimeValue] = D.[DeliveryDate] 
ORDER BY DeliveryDate ASC, PeriodNumber ASC, publishDate DESC 

從這句話中,所有我感興趣的是提取用戶查詢的curveid 90564.

聲明也可能類似於以下任一操作:

SELECT * FROM anytable WHERE curveid = 123 AND deliverydate BETWEEN '2017-01-01' AND 2017-02-01' 

SELECT * FROM mc.anytable WHERE curveid IN (1,2,3,4,5,6,7) 

再次,我想知道的是曲線ID。我不在乎其他任何條款。

我使用Microsoft.SqlServer.TransactSql.ScriptDom命名空間解析SQL,並已經得到的地步,我可以找出所有WHERE使用類似的代碼下面的語句(從其他樣本拼湊起來):

string sql = @"WITH 
      G AS (SELECT [Timevalue] FROM [mc].[GranularityLookup] 
      WHERE [TimeValue] BETWEEN '19-Jul-2017 00:00' AND '30-Sep-2017 00:00' 
      AND [1 Hr] = 1), 
      D AS (SELECT [CurveID], [DeliveryDate], [PublishDate], AVG([Value]) Value, MAX([PeriodNumber]) PeriodNumber 
      FROM mc.CURVEID_6657_1_LATEST data 
      JOIN 
      (SELECT CurveID ID, DeliveryDate dDate, MAX(PublishDate) pDate 
      FROM mc.CURVEID_6657_1_LATEST 
      WHERE CurveID = 90564 
       AND DeliveryDate >= '19-Jul-2017 00:00' AND DeliveryDate <= '30-Sep-2017 00:00' 
      GROUP BY DeliveryDate, CurveID) Dates 
      ON data.DeliveryDate = dates.dDate AND data.PublishDate = dates.pDate 
      WHERE data.CurveID = 90564 
      AND data.DeliveryDate >= '19-Jul-2017 00:00' AND data.DeliveryDate <= '30-Sep-2017 00:00' 
      GROUP BY [CurveID], [PublishDate], [DeliveryDate]) 
      SELECT 
      G.[TimeValue] [Deliver 
      yDate] , D.[PublishDate], D.[Value], D.[PeriodNumber] 
      FROM 
      G 
      LEFT JOIN 
      D 
      ON 
      G.[TimeValue] = D.[DeliveryDate] 
      ORDER BY DeliveryDate ASC, PeriodNumber ASC, publishDate DESC"; 
      var parser = new TSql120Parser(false); 

      IList<ParseError> errors; 
      var fragment = parser.Parse(new StringReader(sql), out errors); 

      var whereVisitor = new WhereVisitor(); 
      fragment.Accept(whereVisitor); 

      // I now have all WHERE clauses in whereVisitor.WhereStatements 

class WhereVisitor : TSqlConcreteFragmentVisitor 
{ 
    public readonly List<WhereClause> WhereStatements = new List<WhereClause>(); 

    public override void Visit(WhereClause node) 
    { 
     WhereStatements.Add(node); 
    } 

} 

whereVisitor.WhereStatements(本例中爲3)中的每個子句都公開一個名爲SearchCondition的屬性。不幸的是,這是我用盡想法的地方。我想實現的是邏輯,按照以下:

foreach (var clause in whereVisitor.WhereStatements) 
{ 
    // IF any part of the clause filters based on curveid THEN 

    //  Capture curveIDs 

    // END IF 
} 

其他詳情:

  • 使用C#(.NET 4.0)
  • SQL Server 2008中
  • DLL這是Microsoft.SqlServer .TransactSql.ScriptDom(位於我的情況'c:\ Program Files(x86)\ Microsoft SQL Server \ 130 \ Tools \ PowerShell \ Modules \ SQLPS \ Microsoft.SqlServer.TransactSql.ScriptDom.dll')

編輯1

一些額外的信息:

  • CurveID是另一個表的鍵。在這個 的情況下對它進行操作是沒有意義的(例如curveId + 1或curveId < = 10)。

編輯2(部分解決方案)

具有以下訪問者與where子句類似的情況下幫助 curveid = 123:

class CurveIdVisitor : TSqlConcreteFragmentVisitor 
{ 
    public readonly List<int> CurveIds = new List<int>(); 

    public override void Visit(BooleanComparisonExpression exp) 
    { 
     if (exp.FirstExpression is ColumnReferenceExpression && exp.SecondExpression is IntegerLiteral) 
     { 
      // there is a possibility that this is of the ilk 'curveid = 123' 
      // we will look for the 'identifier' 
      // we take the last if there are multiple. Example: 
      //  alias.curveid 
      // goives two identifiers: alias and curveid 
      if (
       ((ColumnReferenceExpression) exp.FirstExpression).MultiPartIdentifier.Identifiers.Last().Value.ToLower() == 
       "curveid") 
      { 
       // this is definitely a curveid filter 
       // Now to find the curve id 
       int curveid = int.Parse(((IntegerLiteral) exp.SecondExpression).Value); 
       CurveIds.Add(curveid); 
      } 
     } 
+0

'SearchCondition'返回某種類型的[布爾表達式](https://msdn.microsoft.com/en-us/library/microsoft.sqlserver.transactsql.scriptdom.booleanexpression.aspx),然後您可以再走一步,直到找到'curveID'(或別名)的術語。但對於除了非常簡單的術語之外的任何東西,很難得到它碰到的曲線。 'curveID + 1 = 12-6'可能是可能的,'where curveID = 12或(1 = 1)'將更難理解,並且'where curveID>(從table2選擇max(id))'不會告訴你任何東西。或者嘗試'用g(select id * 2 as curveid ...)...其中g.curveid = 2'。 – Solarflare

+0

@Solarflare我已經編輯了上面的問題,以提供關於您提到的其他案例的進一步說明。縱觀SearchCondition,似乎有一個非常複雜的對象模型,我希望有人能夠提供關於如何使用它的明確答案。謝謝 – GinjaNinja

+0

該對象只會幫助您解釋代碼。它會例如將二進制表達式拆分爲3部分「term1 operator term2」。如果你知道你的sql文本只能看起來像'curveid = something'或'curveid IN something',那麼你可以通過在字符串中走過來提取這些部分。但是,這個對象也會變得簡單 - 而且更容易。例如,知道'curveid = 1 + 2 AND somethingelse'中的第二項在哪裏結束,您需要一些邏輯(即布爾表達式已經使用)。但是,也許你是對的,而有人有不同的想法。 – Solarflare

回答

0

終於解決了這一點,希望這有利於別人在將來。也許別人可能會閱讀時間並提供更好的解決方案。

public class SqlParser 
{ 
    public List<int> GetQueriedCurveIds(string sql) 
    { 
     var parser = new TSql120Parser(false); 

     IList<ParseError> errors; 
     var fragment = parser.Parse(new StringReader(sql), out errors); 

     List<int> curveIds = new List<int>(); 
     CurveIdVisitor cidv = new CurveIdVisitor(); 
     InPredicateVisitor inpv = new InPredicateVisitor(); 
     fragment.AcceptChildren(cidv); 
     fragment.AcceptChildren(inpv); 

     curveIds.AddRange(cidv.CurveIds); 
     curveIds.AddRange(inpv.CurveIds); 
     return curveIds.Distinct().ToList(); 
    } 
} 



class CurveIdVisitor : TSqlConcreteFragmentVisitor 
{ 
    public readonly List<int> CurveIds = new List<int>(); 

    public override void Visit(BooleanComparisonExpression exp) 
    { 
     if (exp.FirstExpression is ColumnReferenceExpression && exp.SecondExpression is IntegerLiteral) 
     { 
      // there is a possibility that this is of the ilk 'curveid = 123' 
      // we will look for the 'identifier' 
      // we take the last if there are multiple. Example: 
      //  alias.curveid 
      // goives two identifiers: alias and curveid 
      if (
       ((ColumnReferenceExpression) exp.FirstExpression).MultiPartIdentifier.Identifiers.Last().Value.ToLower() == 
       "curveid") 
      { 
       // this is definitely a curveid filter 
       // Now to find the curve id 
       int curveid = int.Parse(((IntegerLiteral) exp.SecondExpression).Value); 
       CurveIds.Add(curveid); 
      } 
     } 
    } 
} 

class InPredicateVisitor : TSqlConcreteFragmentVisitor 
{ 
    public readonly List<int> CurveIds = new List<int>(); 

    public override void Visit(InPredicate exp) 
    { 
     if (exp.Expression is ColumnReferenceExpression) 
     { 
      if (
       ((ColumnReferenceExpression) exp.Expression).MultiPartIdentifier.Identifiers.Last().Value.ToLower() == 
       "curveid") 
      { 
       foreach (var value in exp.Values) 
       { 
        if (value is IntegerLiteral) 
        { 
         CurveIds.Add(int.Parse(((IntegerLiteral)value).Value)); 
        } 
       } 
      } 
     } 
    } 
} 

這是削減代碼來演示答案。在現實生活中,你會想檢查ParseError集合並添加一些錯誤處理!