2008-09-02 52 views
17

C#是否有內置的解析頁碼字符串的支持?通過頁碼,我的意思是你可能輸入到一個混合了逗號和短劃線的打印對話框中的格式。C#是否內置了對解析頁碼字符串的支持?

事情是這樣的:

1,3,5-10,12 

將是非常好的那是什麼給我回某種用字符串表示所有頁碼的列表的解決方案。在上面的例子中,得到一個列表類似下面的將是很好:

1,3,5,6,7,8,9,10,12 

我只是想避免我自己的滾動,如果有一個簡單的方法來做到這一點。

+2

要執行反向操作,請參閱http://stackoverflow.com/questions/7688881/convert-list-to-number-range-string – Grhm 2011-11-24 14:33:40

回答

19

應該很簡單:

foreach(string s in "1,3,5-10,12".Split(',')) 
{ 
    // try and get the number 
    int num; 
    if(int.TryParse(s, out num)) 
    { 
     yield return num; 
     continue; // skip the rest 
    } 

    // otherwise we might have a range 
    // split on the range delimiter 
    string[] subs = s.Split('-'); 
    int start, end; 

    // now see if we can parse a start and end 
    if(subs.Length > 1 && 
     int.TryParse(subs[0], out start) && 
     int.TryParse(subs[1], out end) && 
     end >= start) 
    { 
     // create a range between the two values 
     int rangeLength = end - start + 1; 
     foreach(int i in Enumerable.Range(start, rangeLength)) 
     { 
      yield return i; 
     } 
    } 
} 

編輯:感謝修復;-)

+0

我提出了兩個改變:(1)在第一個yield返回num後加上`continue;`,這將節省你需要`else`和(2)將比較改爲`end> = start`,這將啓用你可以支持像1-1這樣的單個項目。 – 2011-06-06 16:19:20

7

它沒有一個內置的方法來做到這一點,但使用String.Split會很簡單。

簡單地分割','然後你有一系列代表頁碼或範圍的字符串。迭代該系列並執行String.Split' - '。如果沒有結果,這是一個普通的頁面號碼,所以把它粘在你的頁面列表中。如果出現結果,請將' - '的左側和右側作爲邊界,並使用簡單的for循環將每個頁碼添加到該範圍內的最終列表。

不能花5分鐘做,然後可能另外10個添加一些理智的檢查,當用戶試圖輸入無效數據(如「1-2-3」或其他東西。)

+0

[@Daniel詹寧斯](http://stackoverflow.com/questions/40161/does-c-have-built-in-support-for-parsing-page-number-strings#40165)這似乎是一個合理的方法。我只是認爲值得確保微軟沒有在那裏處理所有奇怪的邊緣情況的地方有一個PageNumberStringParser。 – 2008-09-02 18:03:20

5

Keith的做法似乎不錯。我使用列表放在一個更天真的方法。這有錯誤檢查,所以希望應該拿起大多數問題: -

public List<int> parsePageNumbers(string input) { 
    if (string.IsNullOrEmpty(input)) 
    throw new InvalidOperationException("Input string is empty."); 

    var pageNos = input.Split(','); 

    var ret = new List<int>(); 
    foreach(string pageString in pageNos) { 
    if (pageString.Contains("-")) { 
     parsePageRange(ret, pageString); 
    } else { 
     ret.Add(parsePageNumber(pageString)); 
    } 
    } 

    ret.Sort(); 
    return ret.Distinct().ToList(); 
} 

private int parsePageNumber(string pageString) { 
    int ret; 

    if (!int.TryParse(pageString, out ret)) { 
    throw new InvalidOperationException(
     string.Format("Page number '{0}' is not valid.", pageString)); 
    } 

    return ret; 
} 

private void parsePageRange(List<int> pageNumbers, string pageNo) { 
    var pageRange = pageNo.Split('-'); 

    if (pageRange.Length != 2) 
    throw new InvalidOperationException(
     string.Format("Page range '{0}' is not valid.", pageNo)); 

    int startPage = parsePageNumber(pageRange[0]), 
    endPage = parsePageNumber(pageRange[1]); 

    if (startPage > endPage) { 
    throw new InvalidOperationException(
     string.Format("Page number {0} is greater than page number {1}" + 
     " in page range '{2}'", startPage, endPage, pageNo)); 
    } 

    pageNumbers.AddRange(Enumerable.Range(startPage, endPage - startPage + 1)); 
} 
2

這是我爲類似的東西煮熟的東西。

它處理以下範圍類型:

1  single number 
1-5  range 
-5  range from (firstpage) up to 5 
5-  range from 5 up to (lastpage) 
..  can use .. instead of - 
;,  can use both semicolon, comma, and space, as separators 

它不檢查重複的值,因此該組1,5,-10將產生序列1,5,1,2 ,3,4,5,6,7,8,9,10

public class RangeParser 
{ 
    public static IEnumerable<Int32> Parse(String s, Int32 firstPage, Int32 lastPage) 
    { 
     String[] parts = s.Split(' ', ';', ','); 
     Regex reRange = new Regex(@"^\s*((?<from>\d+)|(?<from>\d+)(?<sep>(-|\.\.))(?<to>\d+)|(?<sep>(-|\.\.))(?<to>\d+)|(?<from>\d+)(?<sep>(-|\.\.)))\s*$"); 
     foreach (String part in parts) 
     { 
      Match maRange = reRange.Match(part); 
      if (maRange.Success) 
      { 
       Group gFrom = maRange.Groups["from"]; 
       Group gTo = maRange.Groups["to"]; 
       Group gSep = maRange.Groups["sep"]; 

       if (gSep.Success) 
       { 
        Int32 from = firstPage; 
        Int32 to = lastPage; 
        if (gFrom.Success) 
         from = Int32.Parse(gFrom.Value); 
        if (gTo.Success) 
         to = Int32.Parse(gTo.Value); 
        for (Int32 page = from; page <= to; page++) 
         yield return page; 
       } 
       else 
        yield return Int32.Parse(gFrom.Value); 
      } 
     } 
    } 
} 
0

這是lassevk的代碼的一個稍微修改版本,它處理正則表達式匹配中的string.Split操作。它被寫爲擴展方法,您可以使用LINQ的Disinct()擴展輕鬆處理重複的問題。

/// <summary> 
    /// Parses a string representing a range of values into a sequence of integers. 
    /// </summary> 
    /// <param name="s">String to parse</param> 
    /// <param name="minValue">Minimum value for open range specifier</param> 
    /// <param name="maxValue">Maximum value for open range specifier</param> 
    /// <returns>An enumerable sequence of integers</returns> 
    /// <remarks> 
    /// The range is specified as a string in the following forms or combination thereof: 
    /// 5   single value 
    /// 1,2,3,4,5 sequence of values 
    /// 1-5   closed range 
    /// -5   open range (converted to a sequence from minValue to 5) 
    /// 1-   open range (converted to a sequence from 1 to maxValue) 
    /// 
    /// The value delimiter can be either ',' or ';' and the range separator can be 
    /// either '-' or ':'. Whitespace is permitted at any point in the input. 
    /// 
    /// Any elements of the sequence that contain non-digit, non-whitespace, or non-separator 
    /// characters or that are empty are ignored and not returned in the output sequence. 
    /// </remarks> 
    public static IEnumerable<int> ParseRange2(this string s, int minValue, int maxValue) { 
     const string pattern = @"(?:^|(?<=[,;]))      # match must begin with start of string or delim, where delim is , or ; 
           \s*(        # leading whitespace 
           (?<from>\d*)\s*(?:-|:)\s*(?<to>\d+) # capture 'from <sep> to' or '<sep> to', where <sep> is - or : 
           |         # or 
           (?<from>\d+)\s*(?:-|:)\s*(?<to>\d*) # capture 'from <sep> to' or 'from <sep>', where <sep> is - or : 
           |         # or 
           (?<num>\d+)       # capture lone number 
           )\s*         # trailing whitespace 
           (?:(?=[,;\b])|$)      # match must end with end of string or delim, where delim is , or ;"; 

     Regex regx = new Regex(pattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled); 

     foreach (Match m in regx.Matches(s)) { 
      Group gpNum = m.Groups["num"]; 
      if (gpNum.Success) { 
       yield return int.Parse(gpNum.Value); 

      } else { 
       Group gpFrom = m.Groups["from"]; 
       Group gpTo = m.Groups["to"]; 
       if (gpFrom.Success || gpTo.Success) { 
        int from = (gpFrom.Success && gpFrom.Value.Length > 0 ? int.Parse(gpFrom.Value) : minValue); 
        int to = (gpTo.Success && gpTo.Value.Length > 0 ? int.Parse(gpTo.Value) : maxValue); 

        for (int i = from; i <= to; i++) { 
         yield return i; 
        } 
       } 
      } 
     } 
    } 
3

下面是代碼我只是放在一起做這個。你可以在喜歡的格式輸入..1-2,5abcd,6,7,20-15 ,,,,,,

容易添加爲其他格式

private int[] ParseRange(string ranges) 
    { 
     string[] groups = ranges.Split(','); 
     return groups.SelectMany(t => GetRangeNumbers(t)).ToArray(); 
    } 

    private int[] GetRangeNumbers(string range) 
    { 
     //string justNumbers = new String(text.Where(Char.IsDigit).ToArray()); 

     int[] RangeNums = range 
      .Split('-') 
      .Select(t => new String(t.Where(Char.IsDigit).ToArray())) // Digits Only 
      .Where(t => !string.IsNullOrWhiteSpace(t)) // Only if has a value 
      .Select(t => int.Parse(t)) // digit to int 
      .ToArray(); 
     return RangeNums.Length.Equals(2) ? Enumerable.Range(RangeNums.Min(), (RangeNums.Max() + 1) - RangeNums.Min()).ToArray() : RangeNums; 
    } 
0

我想出了答案:

static IEnumerable<string> ParseRange(string str) 
{ 
    var numbers = str.Split(','); 

    foreach (var n in numbers) 
    { 
     if (!n.Contains("-")) 
      yield return n; 
     else 
     { 
      string startStr = String.Join("", n.TakeWhile(c => c != '-')); 
      int startInt = Int32.Parse(startStr); 

      string endStr = String.Join("", n.Reverse().TakeWhile(c => c != '-').Reverse()); 
      int endInt = Int32.Parse(endStr); 

      var range = Enumerable.Range(startInt, endInt - startInt + 1) 
           .Select(num => num.ToString()); 

      foreach (var s in range) 
       yield return s; 
     } 
    } 
} 
1

直到您有測試用例時才能確定。在我的情況下,我更喜歡用空格分隔而不是用逗號分隔。它使解析更復雜一點。

[Fact] 
    public void ShouldBeAbleToParseRanges() 
    { 
     RangeParser.Parse("1").Should().BeEquivalentTo(1); 
     RangeParser.Parse("-1..2").Should().BeEquivalentTo(-1,0,1,2); 

     RangeParser.Parse("-1..2 ").Should().BeEquivalentTo(-1,0,1,2); 
     RangeParser.Parse("-1..2 5").Should().BeEquivalentTo(-1,0,1,2,5); 
     RangeParser.Parse(" -1 .. 2 5").Should().BeEquivalentTo(-1,0,1,2,5); 
    } 

請注意,基思的答案(或一個小的變化)將失敗最後一次測試,其中範圍令牌之間有空白。這需要一個標記器和一個合適的分析器,以提供前瞻性。

namespace Utils 
{ 
    public class RangeParser 
    { 

     public class RangeToken 
     { 
      public string Name; 
      public string Value; 
     } 

     public static IEnumerable<RangeToken> Tokenize(string v) 
     { 
      var pattern = 
       @"(?<number>-?[1-9]+[0-9]*)|" + 
       @"(?<range>\.\.)"; 

      var regex = new Regex(pattern); 
      var matches = regex.Matches(v); 
      foreach (Match match in matches) 
      { 
       var numberGroup = match.Groups["number"]; 
       if (numberGroup.Success) 
       { 
        yield return new RangeToken {Name = "number", Value = numberGroup.Value}; 
        continue; 
       } 
       var rangeGroup = match.Groups["range"]; 
       if (rangeGroup.Success) 
       { 
        yield return new RangeToken {Name = "range", Value = rangeGroup.Value}; 
       } 

      } 
     } 

     public enum State { Start, Unknown, InRange} 

     public static IEnumerable<int> Parse(string v) 
     { 

      var tokens = Tokenize(v); 
      var state = State.Start; 
      var number = 0; 

      foreach (var token in tokens) 
      { 
       switch (token.Name) 
       { 
        case "number": 
         var nextNumber = int.Parse(token.Value); 
         switch (state) 
         { 
          case State.Start: 
           number = nextNumber; 
           state = State.Unknown; 
           break; 
          case State.Unknown: 
           yield return number; 
           number = nextNumber; 
           break; 
          case State.InRange: 
           int rangeLength = nextNumber - number+ 1; 
           foreach (int i in Enumerable.Range(number, rangeLength)) 
           { 
            yield return i; 
           } 
           state = State.Start; 
           break; 
          default: 
           throw new ArgumentOutOfRangeException(); 
         } 
         break; 
        case "range": 
         switch (state) 
         { 
          case State.Start: 
           throw new ArgumentOutOfRangeException(); 
           break; 
          case State.Unknown: 
           state = State.InRange; 
           break; 
          case State.InRange: 
           throw new ArgumentOutOfRangeException(); 
           break; 
          default: 
           throw new ArgumentOutOfRangeException(); 
         } 
         break; 
        default: 
         throw new ArgumentOutOfRangeException(nameof(token)); 
       } 
      } 
      switch (state) 
      { 
       case State.Start: 
        break; 
       case State.Unknown: 
        yield return number; 
        break; 
       case State.InRange: 
        break; 
       default: 
        throw new ArgumentOutOfRangeException(); 
      } 
     } 
    } 
} 
0

正則表達式效率不如下面的代碼。字符串方法比Regex更有效,應儘可能使用。與Split

using System; 
using System.Collections.Generic; 
using System.Linq; 
using System.Text; 
using System.Text.RegularExpressions; 

namespace ConsoleApplication1 
{ 
    class Program 
    { 
     static void Main(string[] args) 
     { 
      string[] inputs = { 
           "001-005/015", 
           "009/015" 
          }; 

      foreach (string input in inputs) 
      { 
       List<int> numbers = new List<int>(); 
       string[] strNums = input.Split(new char[] { '/' }, StringSplitOptions.RemoveEmptyEntries); 
       foreach (string strNum in strNums) 
       { 
        if (strNum.Contains("-")) 
        { 
         int startNum = int.Parse(strNum.Substring(0, strNum.IndexOf("-"))); 
         int endNum = int.Parse(strNum.Substring(strNum.IndexOf("-") + 1)); 
         for (int i = startNum; i <= endNum; i++) 
         { 
          numbers.Add(i); 
         } 
        } 
        else 
         numbers.Add(int.Parse(strNum)); 
       } 
       Console.WriteLine(string.Join(",", numbers.Select(x => x.ToString()))); 
      } 
      Console.ReadLine(); 

     } 
    } 
} 
0

一行方法Linq

string input = "1,3,5-10,12"; 
IEnumerable<int> result = input.Split(',').SelectMany(x => x.Contains('-') ? Enumerable.Range(int.Parse(x.Split('-')[0]), int.Parse(x.Split('-')[1]) - int.Parse(x.Split('-')[0]) + 1) : new int[] { int.Parse(x) });