我有一個包含'曲線'的大型SQL數據庫。每條曲線都有一個ID(曲線ID)。我試圖確定每條曲線的主要用戶,以及是否使用它。爲了實現這一點,DBA提供了對數據庫執行的所有語句的日誌。解析T-SQL來提取WHERE子句的一部分
這些陳述非常複雜。我想要做的是提取正在查詢的曲線。
示例陳述如下:
WITH G AS (SELECT [Timevalue] FROM [mc].[GranularityLookup]
WHERE [TimeValue] BETWEEN '19-Jul-2017 00:00' AND '30-Sep-2017 00:00'
AND [1 Hr] = 1),
D AS (SELECT [CurveID], [DeliveryDate], [PublishDate], AVG([Value]) Value, MAX([PeriodNumber]) PeriodNumber
FROM mc.CURVEID_6657_1_LATEST data
JOIN
(SELECT CurveID ID, DeliveryDate dDate, MAX(PublishDate) pDate
FROM mc.CURVEID_6657_1_LATEST
WHERE CurveID = 90564
AND DeliveryDate >= '19-Jul-2017 00:00' AND DeliveryDate <= '30-Sep-2017 00:00'
GROUP BY DeliveryDate, CurveID) Dates
ON data.DeliveryDate = dates.dDate AND data.PublishDate = dates.pDate
WHERE data.CurveID = 90564
AND data.DeliveryDate >= '19-Jul-2017 00:00' AND data.DeliveryDate <= '30-Sep-2017 00:00'
GROUP BY [CurveID], [PublishDate], [DeliveryDate])
SELECT
G.[TimeValue] [Deliver
yDate] , D.[PublishDate], D.[Value], D.[PeriodNumber]
FROM
G
LEFT JOIN
D
ON
G.[TimeValue] = D.[DeliveryDate]
ORDER BY DeliveryDate ASC, PeriodNumber ASC, publishDate DESC
從這句話中,所有我感興趣的是提取用戶查詢的curveid 90564.
聲明也可能類似於以下任一操作:
SELECT * FROM anytable WHERE curveid = 123 AND deliverydate BETWEEN '2017-01-01' AND 2017-02-01'
或
SELECT * FROM mc.anytable WHERE curveid IN (1,2,3,4,5,6,7)
再次,我想知道的是曲線ID。我不在乎其他任何條款。
我使用Microsoft.SqlServer.TransactSql.ScriptDom命名空間解析SQL,並已經得到的地步,我可以找出所有WHERE使用類似的代碼下面的語句(從其他樣本拼湊起來):
string sql = @"WITH
G AS (SELECT [Timevalue] FROM [mc].[GranularityLookup]
WHERE [TimeValue] BETWEEN '19-Jul-2017 00:00' AND '30-Sep-2017 00:00'
AND [1 Hr] = 1),
D AS (SELECT [CurveID], [DeliveryDate], [PublishDate], AVG([Value]) Value, MAX([PeriodNumber]) PeriodNumber
FROM mc.CURVEID_6657_1_LATEST data
JOIN
(SELECT CurveID ID, DeliveryDate dDate, MAX(PublishDate) pDate
FROM mc.CURVEID_6657_1_LATEST
WHERE CurveID = 90564
AND DeliveryDate >= '19-Jul-2017 00:00' AND DeliveryDate <= '30-Sep-2017 00:00'
GROUP BY DeliveryDate, CurveID) Dates
ON data.DeliveryDate = dates.dDate AND data.PublishDate = dates.pDate
WHERE data.CurveID = 90564
AND data.DeliveryDate >= '19-Jul-2017 00:00' AND data.DeliveryDate <= '30-Sep-2017 00:00'
GROUP BY [CurveID], [PublishDate], [DeliveryDate])
SELECT
G.[TimeValue] [Deliver
yDate] , D.[PublishDate], D.[Value], D.[PeriodNumber]
FROM
G
LEFT JOIN
D
ON
G.[TimeValue] = D.[DeliveryDate]
ORDER BY DeliveryDate ASC, PeriodNumber ASC, publishDate DESC";
var parser = new TSql120Parser(false);
IList<ParseError> errors;
var fragment = parser.Parse(new StringReader(sql), out errors);
var whereVisitor = new WhereVisitor();
fragment.Accept(whereVisitor);
// I now have all WHERE clauses in whereVisitor.WhereStatements
class WhereVisitor : TSqlConcreteFragmentVisitor
{
public readonly List<WhereClause> WhereStatements = new List<WhereClause>();
public override void Visit(WhereClause node)
{
WhereStatements.Add(node);
}
}
whereVisitor.WhereStatements(本例中爲3)中的每個子句都公開一個名爲SearchCondition的屬性。不幸的是,這是我用盡想法的地方。我想實現的是邏輯,按照以下:
foreach (var clause in whereVisitor.WhereStatements)
{
// IF any part of the clause filters based on curveid THEN
// Capture curveIDs
// END IF
}
其他詳情:
- 使用C#(.NET 4.0)
- SQL Server 2008中
- DLL這是Microsoft.SqlServer .TransactSql.ScriptDom(位於我的情況'c:\ Program Files(x86)\ Microsoft SQL Server \ 130 \ Tools \ PowerShell \ Modules \ SQLPS \ Microsoft.SqlServer.TransactSql.ScriptDom.dll')
編輯1
一些額外的信息:
- CurveID是另一個表的鍵。在這個 的情況下對它進行操作是沒有意義的(例如curveId + 1或curveId < = 10)。
編輯2(部分解決方案)
具有以下訪問者與where子句類似的情況下幫助 curveid = 123:
class CurveIdVisitor : TSqlConcreteFragmentVisitor
{
public readonly List<int> CurveIds = new List<int>();
public override void Visit(BooleanComparisonExpression exp)
{
if (exp.FirstExpression is ColumnReferenceExpression && exp.SecondExpression is IntegerLiteral)
{
// there is a possibility that this is of the ilk 'curveid = 123'
// we will look for the 'identifier'
// we take the last if there are multiple. Example:
// alias.curveid
// goives two identifiers: alias and curveid
if (
((ColumnReferenceExpression) exp.FirstExpression).MultiPartIdentifier.Identifiers.Last().Value.ToLower() ==
"curveid")
{
// this is definitely a curveid filter
// Now to find the curve id
int curveid = int.Parse(((IntegerLiteral) exp.SecondExpression).Value);
CurveIds.Add(curveid);
}
}
'SearchCondition'返回某種類型的[布爾表達式](https://msdn.microsoft.com/en-us/library/microsoft.sqlserver.transactsql.scriptdom.booleanexpression.aspx),然後您可以再走一步,直到找到'curveID'(或別名)的術語。但對於除了非常簡單的術語之外的任何東西,很難得到它碰到的曲線。 'curveID + 1 = 12-6'可能是可能的,'where curveID = 12或(1 = 1)'將更難理解,並且'where curveID>(從table2選擇max(id))'不會告訴你任何東西。或者嘗試'用g(select id * 2 as curveid ...)...其中g.curveid = 2'。 – Solarflare
@Solarflare我已經編輯了上面的問題,以提供關於您提到的其他案例的進一步說明。縱觀SearchCondition,似乎有一個非常複雜的對象模型,我希望有人能夠提供關於如何使用它的明確答案。謝謝 – GinjaNinja
該對象只會幫助您解釋代碼。它會例如將二進制表達式拆分爲3部分「term1 operator term2」。如果你知道你的sql文本只能看起來像'curveid = something'或'curveid IN something',那麼你可以通過在字符串中走過來提取這些部分。但是,這個對象也會變得簡單 - 而且更容易。例如,知道'curveid = 1 + 2 AND somethingelse'中的第二項在哪裏結束,您需要一些邏輯(即布爾表達式已經使用)。但是,也許你是對的,而有人有不同的想法。 – Solarflare