2014-12-02 29 views
2

Microsoft已公開scriptdom API以解析並生成TSQL。我是新手,仍然在玩它。我想知道如何從這樣的查詢中獲得交叉數據庫引用。如何使用scriptdom API提取交叉數據庫引用

UPDATE t3 
SET  description = 'abc' 
FROM database1.dbo.table1 t1 
     INNER JOIN database2.dbo.table2 t2 
      ON (t1.id = t2.t1_id) 
     LEFT OUTER JOIN database3.dbo.table3 t3 
      ON (t3.id = t2.t3_id) 
     INNER JOIN database2.dbo.table4 t4 
      ON (t4.id = t2.t4_id) 

我要的是引用的列表:

database1.dbo.table1.id = database2.dbo.table2.t1_id 
database3.dbo.table3.id = database2.dbo.table2.t3_id 
database2.dbo.table4.id = database2.dbo.table2.t4_id 

不過,對於最後一項database2.dbo.table4.id = database2.dbo.table2.t4_id,無論是從2點結束的列都來自同一個數據庫database2,這不是我想要的是。所以我最後要求的結果是:

database1.dbo.table1.id = database2.dbo.table2.t1_id 
database3.dbo.table3.id = database2.dbo.table2.t3_id 

是可以使用scriptdom實現嗎?

+0

這實際上並不容易做到這一點*魯棒*。你想對查詢的形式做些什麼,如果有的話?例如,我們應該如何處理'INNER JOIN database2.dbo.table2 t2 ON(t1.id = t2.t1_id)OR(t3.id = t2.t3_id)' - 這會產生兩個引用嗎?那麼'ON t1.id + 1 = t2.t1_id' - 這是一個參考,它與涉及't1.id'的其他不同?那麼'ON t1.id - t2.t1_id = 0'呢? – 2014-12-04 16:43:58

+0

是的,這很難。如果連接條件包含子查詢,它將變得更加困難。現在我們可以假設連接條件是簡單的等連接。沒有邏輯運算符或子查詢。 – 2014-12-04 17:34:39

+0

這非常困難,'TSqlParser'實際上幫助不大。原因在於它沒有任何努力來解決引用 - 即使在一個簡單的「SELECT a FROM b」中,解析器也不能告訴你'a'必須是'b'的列(並且要公平,像'SELECT a FROM b,c'這樣的情況確實無法通過知道表結構來解決)。爲了解決這個問題,你需要編寫自己的列解析邏輯。有趣,但相當參與。 – 2014-12-04 17:58:47

回答

8

一個強大的實現並不容易。對於這個問題提出的有限問題,解決辦法相對簡單 - 強調「相對」。我假定:

  • 查詢只有一個級別 - 沒有工會,子查詢,使用表達式或其他東西,引入新的作用域的別名(這樣就可以很快得到複雜)。
  • 查詢中的所有標識符都是完全限定的,所以毫無疑問它指的是什麼對象。

解決方案策略如下所示:我們首先訪問TSqlFragment列出所有表別名,然後再次訪問它以獲取所有的等寬線,同時擴展別名。使用該列表,我們確定不涉及相同數據庫的等距鏈列表。在代碼:

var sql = @" 
    UPDATE t3 
    SET  description = 'abc' 
    FROM database1.dbo.table1 t1 
     INNER JOIN database2.dbo.table2 t2 
     ON (t1.id = t2.t1_id) 
     LEFT OUTER JOIN database3.dbo.table3 t3 
     ON (t3.id = t2.t3_id) 
     INNER JOIN database2.dbo.table4 t4 
     ON (t4.id = t2.t4_id) 

";     

var parser = new TSql120Parser(initialQuotedIdentifiers: false); 
IList<ParseError> errors; 
TSqlScript script; 
using (var reader = new StringReader(sql)) { 
    script = (TSqlScript) parser.Parse(reader, out errors); 
} 
// First resolve aliases. 
var aliasResolutionVisitor = new AliasResolutionVisitor(); 
script.Accept(aliasResolutionVisitor); 

// Then find all equijoins, expanding aliases along the way. 
var findEqualityJoinVisitor = new FindEqualityJoinVisitor(
    aliasResolutionVisitor.Aliases 
); 
script.Accept(findEqualityJoinVisitor); 

// Now list all aliases where the left database is not the same 
// as the right database. 
foreach (
    var equiJoin in 
    findEqualityJoinVisitor.EqualityJoins.Where(
    j => !j.JoinsSameDatabase() 
) 
) { 
    Console.WriteLine(equiJoin.ToString()); 
} 

輸出:

database3.dbo.table3.id = database2.dbo.table2.t3_id 
database1.dbo.table1.id = database2.dbo.table2.t1_id 

AliasResolutionVisitor是一個簡單的事情:

public class AliasResolutionVisitor : TSqlFragmentVisitor { 
    readonly Dictionary<string, string> aliases = new Dictionary<string, string>(); 
    public Dictionary<string, string> Aliases { get { return aliases; } } 

    public override void Visit(NamedTableReference namedTableReference) { 
    Identifier alias = namedTableReference.Alias; 
    string baseObjectName = namedTableReference.SchemaObject.AsObjectName(); 
    if (alias != null) { 
     aliases.Add(alias.Value, baseObjectName); 
    } 
    } 
} 

我們只是通過所有查詢指定表的參考,如果他們有一個別名,將其添加到字典。請注意,如果引入子查詢,這會失敗,因爲這個訪問者沒有範圍的概念(事實上,向訪問者添加範圍更加困難,因爲TSqlFragment沒有辦法註釋分析樹,甚至不能從節點走過它) 。

EqualityJoinVisitor更有趣:

public class FindEqualityJoinVisitor : TSqlFragmentVisitor { 
    readonly Dictionary<string, string> aliases; 
    public FindEqualityJoinVisitor(Dictionary<string, string> aliases) { 
    this.aliases = aliases; 
    } 

    readonly List<EqualityJoin> equalityJoins = new List<EqualityJoin>(); 
    public List<EqualityJoin> EqualityJoins { get { return equalityJoins; } } 

    public override void Visit(QualifiedJoin qualifiedJoin) { 
    var findEqualityComparisonVisitor = new FindEqualityComparisonVisitor(); 
    qualifiedJoin.SearchCondition.Accept(findEqualityComparisonVisitor); 
    foreach (
     var equalityComparison in findEqualityComparisonVisitor.Comparisons 
    ) { 
     var firstColumnReferenceExpression = 
     equalityComparison.FirstExpression as ColumnReferenceExpression 
     ; 
     var secondColumnReferenceExpression = 
     equalityComparison.SecondExpression as ColumnReferenceExpression 
     ; 
     if (
     firstColumnReferenceExpression != null && 
     secondColumnReferenceExpression != null 
    ) { 
     string firstColumnResolved = resolveMultipartIdentifier(
      firstColumnReferenceExpression.MultiPartIdentifier 
     ); 
     string secondColumnResolved = resolveMultipartIdentifier(
      secondColumnReferenceExpression.MultiPartIdentifier 
     ); 
     equalityJoins.Add(
      new EqualityJoin(firstColumnResolved, secondColumnResolved) 
     ); 
     } 
    } 
    } 

    private string resolveMultipartIdentifier(MultiPartIdentifier identifier) { 
    if (
     identifier.Identifiers.Count == 2 && 
     aliases.ContainsKey(identifier.Identifiers[0].Value) 
    ) { 
     return 
     aliases[identifier.Identifiers[0].Value] + "." + 
     identifier.Identifiers[1].Value; 
    } else { 
     return identifier.AsObjectName(); 
    } 
    } 
} 

這會尋找QualifiedJoin情況下,如果我們找到他們,我們依次檢查搜索條件找到相等比較的所有地方。請注意,這適用於嵌套搜索條件:在Bar JOIN Foo ON Bar.Quux = Foo.Quux AND Bar.Baz = Foo.Baz中,我們會找到兩個表達式。

我們如何找到它們?使用另一個小訪問者:

public class FindEqualityComparisonVisitor : TSqlFragmentVisitor { 
    List<BooleanComparisonExpression> comparisons = 
    new List<BooleanComparisonExpression>() 
    ; 
    public List<BooleanComparisonExpression> Comparisons { 
    get { return comparisons; } 
    } 

    public override void Visit(BooleanComparisonExpression e) { 
    if (e.IsEqualityComparison()) comparisons.Add(e); 
    } 
} 

這裏沒有什麼複雜的。將這些代碼隱藏到其他訪問者中並不難,但我認爲這更清晰。

就是這樣,除了一些輔助代碼,我將介紹不加評論:

public class EqualityJoin { 
    readonly SchemaObjectName left; 
    public SchemaObjectName Left { get { return left; } } 

    readonly SchemaObjectName right; 
    public SchemaObjectName Right { get { return right; } } 

    public EqualityJoin(
    string qualifiedObjectNameLeft, string qualifiedObjectNameRight 
) { 
    var parser = new TSql120Parser(initialQuotedIdentifiers: false); 
    IList<ParseError> errors; 
    using (var reader = new StringReader(qualifiedObjectNameLeft)) { 
     left = parser.ParseSchemaObjectName(reader, out errors); 
    } 
    using (var reader = new StringReader(qualifiedObjectNameRight)) { 
     right = parser.ParseSchemaObjectName(reader, out errors); 
    } 
    } 

    public bool JoinsSameDatabase() { 
    return left.Identifiers[0].Value == right.Identifiers[0].Value; 
    } 

    public override string ToString() { 
    return String.Format("{0} = {1}", left.AsObjectName(), right.AsObjectName()); 
    } 
} 

public static class MultiPartIdentifierExtensions { 
    public static string AsObjectName(this MultiPartIdentifier multiPartIdentifier) { 
    return string.Join(".", multiPartIdentifier.Identifiers.Select(i => i.Value)); 
    } 
} 

public static class ExpressionExtensions { 
    public static bool IsEqualityComparison(this BooleanExpression expression) { 
    return 
     expression is BooleanComparisonExpression && 
     ((BooleanComparisonExpression) expression).ComparisonType == BooleanComparisonType.Equals 
    ; 
    } 
} 

正如我前面提到的,這個代碼很脆。它假定查詢具有特定的形式,如果它們沒有,它可能會失敗(相當糟糕,通過給出令人誤解的結果)。一個主要的開放挑戰是擴展它,以便它可以正確處理作用域和非限定引用,以及T-SQL腳本可能具有的其他奇怪特性,但我認爲這是一個有用的起點。

2

也許是另一種方式來嘗試這是因爲執行查詢:

SET SHOWPLAN_XML ON; 
UPDATE t3 
SET  description = 'abc' 
FROM database1.dbo.table1 t1 
     INNER JOIN database2.dbo.table2 t2 
      ON (t1.id = t2.t1_id) 
     LEFT OUTER JOIN database3.dbo.table3 t3 
      ON (t3.id = t2.t3_id) 
     INNER JOIN database2.dbo.table4 t4 
      ON (t4.id = t2.t4_id) 

這將返回XML查詢計劃。在XML中,您可以在RelOp節點下找到加入條件。例如,對於一個哈希聯接環,你會看到類似這樣的:

<RelOp NodeId="7" PhysicalOp="Hash Match" LogicalOp="Inner Join" EstimateRows="1" EstimateIO="0" EstimateCPU="0.0177716" AvgRowSize="15" EstimatedTotalSubtreeCost="0.0243408" Parallel="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row"> 
.. some stuff cut from here 
    <Hash> 
.. 
<ProbeResidual> 
    <ScalarOperator ScalarString="[database2].[dbo].[table4].[Id] as [t4].[Id]=[database2].[dbo].[table2].[t4_Id] as [t2].[t4_Id]"> 
    <Compare CompareOp="EQ"> 
    <ScalarOperator> 
     <Identifier> 
     <ColumnReference Database="[database2]" Schema="[dbo]" Table="[table4]" Alias="[t4]" Column="Id" /> 
     </Identifier> 
    </ScalarOperator> 
    <ScalarOperator> 
     <Identifier> 
     <ColumnReference Database="[database2]" Schema="[dbo]" Table="[table2]" Alias="[t2]" Column="t4_Id" /> 
     </Identifier> 
    </ScalarOperator> 
    </Compare> 
</ScalarOperator> 

對於嵌套循環的東西沿着線:

<NestedLoops Optimized="0"> 
<Predicate> 
    <ScalarOperator ScalarString="[database3].[dbo].[table3].[Id] as [t3].[Id]=[database2].[dbo].[table2].[t3_id] as [t2].[t3_id]"> 
    <Compare CompareOp="EQ"> 
     <ScalarOperator> 
     <Identifier> 
      <ColumnReference Database="[database3]" Schema="[dbo]" Table="[table3]" Alias="[t3]" Column="Id" /> 
     </Identifier> 
     </ScalarOperator> 
     <ScalarOperator> 
     <Identifier> 
      <ColumnReference Database="[database2]" Schema="[dbo]" Table="[table2]" Alias="[t2]" Column="t3_id" /> 
     </Identifier> 
     </ScalarOperator> 
    </Compare> 
    </ScalarOperator> 
</Predicate> 

也許你可以再在C#中處理這提取所有連接,然後比較列引用中保存的數據庫。

格式化道歉。

+0

哇,這是一個有創意的想法!我會試一試。謝謝回答。 – 2014-12-09 02:55:45