2013-12-19 118 views
3

我有以下字符串:解析字符串內的這個字符串的最佳方法是什麼?

string fullString = "group = '2843360' and (team in ('TEAM1', 'TEAM2','TEAM3'))" 

我想分析出這個字符串變成

string group = ParseoutGroup(fullString); // Expect "2843360" 
string[] teams = ParseoutTeamNames(fullString); // Expect array with three items 

在滿弦的例子而言,我可以列出一個或多個團隊(並非總是如上所述的三個)。

我有這個部分工作,但我的代碼感覺很hacky,並沒有很好的將來證明,所以我想看看是否有更好的正則表達式解決方案在這裏或更優雅的方式來解析這些值從這個完整的字符串?之後可能會有其他的東西添加到字符串中,所以我希望它儘可能地萬無一失。

+4

如果您沒有看到其他解決方案,則很難提供更好的解決方案 –

+2

爲什麼不發佈當前的解決方案,我們可以看到有關改進方案。 – tofutim

+0

「並非總是像上面那樣3」 – tofutim

回答

4

我沒做到這一點使用regular expressions

var str = "group = '2843360' and (team in ('TEAM1', 'TEAM2','TEAM3'))"; 

// Grabs the group ID 
var group = Regex.Match(str, @"group = '(?<ID>\d+)'", RegexOptions.IgnoreCase) 
    .Groups["ID"].Value; 

// Grabs everything inside teams parentheses 
var teams = Regex.Match(str, @"team in \((?<Teams>(\s*'[^']+'\s*,?)+)\)", RegexOptions.IgnoreCase) 
    .Groups["Teams"].Value; 

// Trim and remove single quotes 
var teamsArray = teams.Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries) 
    .Select(s => 
     { 
      var trimmed = s.Trim(); 
      return trimmed.Substring(1, trimmed.Length - 2); 
     }).ToArray(); 

結果將是:

string[] { "TEAM1", "TEAM2", "TEAM3" } 
+1

布魯諾做得很好。 – tofutim

6

在最簡單的情況下,正則表達式可能是最好的答案。 不幸的是,在這種情況下,我們似乎需要解析一部分SQL語言。雖然可以用正則表達式解決這個問題,但它們並不是用來解析複雜的語言(嵌套括號和轉義字符串)。

這些需求也會隨着時間的推移而變化,並且需要解析更復雜的結構。

如果公司政策允許,我將選擇構建內部DSL以解析此字符串。

我最喜歡的工具來構建內部DLSS被稱爲Sprache

下面你可以找到使用內部DSL的方法的例子解析器。

在代碼中,我已經定義了原語來處理所需的SQL操作符,並將其構成最終解析器。

[Test] 
    public void Test() 
    { 
     string fullString = "group = '2843360' and (team in ('TEAM1', 'TEAM2','TEAM3'))"; 


     var resultParser = 
      from @group in OperatorEquals("group") 
      from @and in OperatorEnd() 
      from @team in Brackets(OperatorIn("team")) 
      select new {@group, @team}; 
     var result = resultParser.Parse(fullString); 
     Assert.That(result.group, Is.EqualTo("2843360")); 
     Assert.That(result.team, Is.EquivalentTo(new[] {"TEAM1", "TEAM2", "TEAM3"})); 
    } 

    private static readonly Parser<char> CellSeparator = 
     from space1 in Parse.WhiteSpace.Many() 
     from s in Parse.Char(',') 
     from space2 in Parse.WhiteSpace.Many() 
     select s; 

    private static readonly Parser<char> QuoteEscape = Parse.Char('\\'); 

    private static Parser<T> Escaped<T>(Parser<T> following) 
    { 
     return from escape in QuoteEscape 
       from f in following 
       select f; 
    } 

    private static readonly Parser<char> QuotedCellDelimiter = Parse.Char('\''); 

    private static readonly Parser<char> QuotedCellContent = 
     Parse.AnyChar.Except(QuotedCellDelimiter).Or(Escaped(QuotedCellDelimiter)); 

    private static readonly Parser<string> QuotedCell = 
     from open in QuotedCellDelimiter 
     from content in QuotedCellContent.Many().Text() 
     from end in QuotedCellDelimiter 
     select content; 

    private static Parser<string> OperatorEquals(string column) 
    { 
     return 
      from c in Parse.String(column) 
      from space1 in Parse.WhiteSpace.Many() 
      from opEquals in Parse.Char('=') 
      from space2 in Parse.WhiteSpace.Many() 
      from content in QuotedCell 
      select content; 
    } 

    private static Parser<bool> OperatorEnd() 
    { 
     return 
      from space1 in Parse.WhiteSpace.Many() 
      from c in Parse.String("and") 
      from space2 in Parse.WhiteSpace.Many() 
      select true; 
    } 

    private static Parser<T> Brackets<T>(Parser<T> contentParser) 
    { 
     return from open in Parse.Char('(') 
       from space1 in Parse.WhiteSpace.Many() 
       from content in contentParser 
       from space2 in Parse.WhiteSpace.Many() 
       from close in Parse.Char(')') 
       select content; 
    } 

    private static Parser<IEnumerable<string>> ComaSeparated() 
    { 
     return from leading in QuotedCell 
       from rest in CellSeparator.Then(_ => QuotedCell).Many() 
       select Cons(leading, rest); 
    } 

    private static Parser<IEnumerable<string>> OperatorIn(string column) 
    { 
     return 
      from c in Parse.String(column) 
      from space1 in Parse.WhiteSpace 
      from opEquals in Parse.String("in") 
      from space2 in Parse.WhiteSpace.Many() 
      from content in Brackets(ComaSeparated()) 
      from space3 in Parse.WhiteSpace.Many() 
      select content; 
    } 

    private static IEnumerable<T> Cons<T>(T head, IEnumerable<T> rest) 
    { 
     yield return head; 
     foreach (T item in rest) 
      yield return item; 
    } 
0

我認爲你需要尋找到一個標記化過程,以得到期望的結果,並考慮到由括號建立執行順序。您可以使用分流碼算法來協助標記和執行順序。

分流場的優點是它允許你定義令牌,以後可以用它來解析字符串並執行正確的操作。雖然它通常適用於操作的數學順序,但它可以根據您的目的進行調整。

下面是一些信息:

http://en.wikipedia.org/wiki/Shunting-yard_algorithm http://www.slideshare.net/grahamwell/shunting-yard

1

有probabl這是一個正則表達式的解決方案,但如果格式嚴格,我首先嚐試高效的字符串方法。以下內容適用於您的輸入。

我使用的是自定義類,TeamGroup,封裝複雜性和一個對象來保存所有相關屬性:

public class TeamGroup 
{ 
    public string Group { get; set; } 
    public string[] Teams { get; set; } 

    public static TeamGroup ParseOut(string fullString) 
    { 
     TeamGroup tg = new TeamGroup{ Teams = new string[]{ } }; 
     int index = fullString.IndexOf("group = '"); 
     if (index >= 0) 
     { 
      index += "group = '".Length; 
      int endIndex = fullString.IndexOf("'", index); 
      if (endIndex >= 0) 
      { 
       tg.Group = fullString.Substring(index, endIndex - index).Trim(' ', '\''); 
       endIndex += 1; 
       index = fullString.IndexOf(" and (team in (", endIndex); 
       if (index >= 0) 
       { 
        index += " and (team in (".Length; 
        endIndex = fullString.IndexOf(")", index); 
        if (endIndex >= 0) 
        { 
         string allTeamsString = fullString.Substring(index, endIndex - index); 
         tg.Teams = allTeamsString.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries) 
          .Select(t => t.Trim(' ', '\'')) 
          .ToArray(); 
        } 
       } 
      } 
     } 
     return tg; 
    } 
} 

你會使用它這樣:

string fullString = "group = '2843360' and (team in ('TEAM1', 'TEAM2','TEAM3'))"; 
TeamGroup tg = TeamGroup.ParseOut(fullString); 
Console.Write("Group: {0} Teams: {1}", tg.Group, string.Join(", ", tg.Teams)); 

輸出:

Group: 2843360 Teams: TEAM1, TEAM2, TEAM3 
0

如果fullString不是機器生成的,則可能需要添加一些err或捕捉,但這將開箱即用,並給你一個測試工作。

public string ParseoutGroup(string fullString) 
    { 
     var matches = Regex.Matches(fullString, @"group\s?=\s?'([^']+)'", RegexOptions.IgnoreCase); 
     return matches[0].Groups[1].Captures[0].Value; 
    } 

    public string[] ParseoutTeamNames(string fullString) 
    { 
     var teams = new List<string>(); 
     var matches = Regex.Matches(fullString, @"team\s?in\s?\((\s*'([^']+)',?\s*)+\)", RegexOptions.IgnoreCase); 
     foreach (var capture in matches[0].Groups[2].Captures) 
     { 
      teams.Add(capture.ToString()); 
     } 
     return teams.ToArray(); 
    } 

    [Test] 
    public void parser() 
    { 
     string test = "group = '2843360' and (team in ('team1', 'team2', 'team3'))"; 
     var group = ParseoutGroup(test); 
     Assert.AreEqual("2843360",group); 

     var teams = ParseoutTeamNames(test); 
     Assert.AreEqual(3, teams.Count()); 
     Assert.AreEqual("team1", teams[0]); 
     Assert.AreEqual("team2", teams[1]); 
     Assert.AreEqual("team3", teams[2]); 
    } 
0

的除了@ BrunoLM的解決方案:

(值得額外的行,如果你有更多的變數,檢查以後):

您可以分割字符串「和「關鍵字,並有一個函數來檢查每個子句與適當的正則表達式語句並返回所需的值。

(未經測試的代碼,但它應該實現這個想法。)

statments = statment.split('and') 
//So now: 
//statments[0] = "group = '2843360' " 
//statments[1] = "(team in ('TEAM1', 'TEAM2','TEAM3'))" 
foreach s in statments { 
    if (s.contains('group') group = RegexFunctionToExtract_GroupValue(s) ; 
    if (s.contains('team') teams = RegexFunctionToExtract_TeamValue(s) ; 
} 

我認爲,這種做法將提供更清潔,易於維護的代碼和輕微優化。

當然,這種方法並不期望有一個「OR」子句。但是,可以稍微調整一下。

相關問題