查找字符串中空白行的索引

假設我有一個字符串，其中包含文本文件，回車符和製表符以及全部。如何在該字符串中找到第一個空白行的索引（以包含僅包含lines-containing-whitespace）？查找字符串中空白行的索引

我已經試過：

在這種情況下，我得到一個利用一堆醜陋的代碼，找到空行的索引工作功能。必須有比這個更優雅/可讀的方式來做到這一點。

爲了清楚起見，下面的函數將字符串從提供的「標題」返回到標題後第一個空行的索引。全部提供，因爲它的大部分都是通過搜索該索引消耗的，並且爲了避免任何'爲什麼在世界上你需要空行索引'的問題。也是爲了抵消XY問題，如果它發生在這裏。

的（工作顯然，沒有測試所有的特殊情況）代碼：

// Get subsection indicated by supplied title from supplied section 
private static string GetSubSectionText(string section, string subSectionTitle) 
    { 
     int indexSubSectionBgn = section.IndexOf(subSectionTitle); 
     if (indexSubSectionBgn == -1) 
      return String.Empty; 

     int indexSubSectionEnd = section.Length; 

     // Find first blank line after found sub-section 
     bool blankLineFound = false; 
     int lineStartIndex = 0; 
     int lineEndIndex = 0; 
     do 
     { 
      string temp; 
      lineEndIndex = section.IndexOf(Environment.NewLine, lineStartIndex); 

      if (lineEndIndex == -1) 
       temp = section.Substring(lineStartIndex); 
      else 
       temp = section.Substring(lineStartIndex, (lineEndIndex - lineStartIndex)); 

      temp = temp.Trim(); 
      if (temp.Length == 0) 
      { 
       if (lineEndIndex == -1) 
        indexSubSectionEnd = section.Length; 
       else 
        indexSubSectionEnd = lineEndIndex; 

       blankLineFound = true; 
      } 
      else 
      { 
       lineStartIndex = lineEndIndex + 1; 
      } 
     } while (!blankLineFound && (lineEndIndex != -1)); 

     if (blankLineFound) 
      return section.Substring(indexSubSectionBgn, indexSubSectionEnd); 
     else 
      return null; 
}

的後續編輯：

結果（主要基於康斯坦丁的答案）：

// Get subsection indicated by supplied title from supplied section 
private static string GetSubSectionText(string section, string subSectionTitle) 
{ 
     string[] lines = section.Split(new string[] { Environment.NewLine }, StringSplitOptions.None); 
     int subsectStart = 0; 
     int subsectEnd = lines.Length; 

     // Find subsection start 
     for (int i = 0; i < lines.Length; i++) 
     { 
      if (lines[i].Trim() == subSectionTitle) 
      { 
       subsectStart = i; 
       break; 
      } 
     } 

     // Find subsection end (ie, first blank line) 
     for (int i = subsectStart; i < lines.Length; i++) 
     { 
      if (lines[i].Trim().Length == 0) 
      { 
       subsectEnd = i; 
       break; 
      } 
     } 

     return string.Join(Environment.NewLine, lines, subsectStart, subsectEnd - subsectStart);

}

主結果和康斯坦丁的答案之間的差異是由於框架版本（我正在使用.NET 2.0，它不支持string []。Take），並且利用Environment.NewLine而不是硬編碼的'\ n'。比原來的通行證多得多，更漂亮，更可讀。謝謝大家！

來源

2012-12-04 Christopher Berman

我懷疑答案是「克里斯托弗，學RegEx」。 –

應該返回的函數是什麼？從這個問題來看，這聽起來像你想要空白行的第一個字符的索引，但該方法看起來像它返回空白行本身。 –

我應該更清楚一點;我編輯了這個問題來包含函數的目的。簡而言之，在部分中搜索子部分標題，並返回一個字符串，其中包含從找到的subsectionTitle到後面的第一個空白行的所有文本。 –

您是否嘗試過使用String.Split Method：

string s = "safsadfd\r\ndfgfdg\r\n\r\ndfgfgg"; 
string[] lines = s.Split('\n'); 
int i; 
for (i = 0; i < lines.Length; i++) 
{ 
    if (string.IsNullOrWhiteSpace(lines[i]))  
    //if (lines[i].Length == 0)   //or maybe this suits better.. 
    //if (lines[i].Equals(string.Empty)) //or this 
    { 
     Console.WriteLine(i); 
     break; 
    } 
} 
Console.WriteLine(string.Join("\n",lines.Take(i)));

編輯：響應的OP的編輯。

來源

2012-12-04 01:06:31 horgh

+1：如果您使用的是文本行，則應該如此處理。 – millimoose

正是我在找什麼，而且更清楚。謝謝！ –

（錯過了編輯定時器）：這是更合理的思考方式。我將編輯產品到問題 –

通過「空白行」，你的意思是一行只包含空格？是的，你應該使用正則表達式;您正在尋找的語法是@"(?<=\r?\n)[ \t]*(\r?\n|$)"。

(?<= ... )表示向前看，應該在你要找的東西之前。
\r?\n表示一個換行符，支持Unix和Windows約定。
(?<=\r?\n)因此是前面換行符的向前看。
[ \t]*表示零個或多個空格或製表符;這些將匹配空白行的內容（如果有的話）。
(\r?\n|$)表示換行符或文件結束符。

例子：

string source = "Line 1\r\nLine 2\r\n \r\nLine 4\r\n"; 
Match firstBlankLineMatch = Regex.Match(source, @"(?<=\r?\n)[ \t]*(\r?\n|$)"); 
int firstBlankLineIndex = 
    firstBlankLineMatch.Success ? firstBlankLineMatch.Index : -1;

來源

2012-12-04 01:06:38 Douglas

在正則表達式之後，應該有一個長長的評論來幫助這個可憐的傢伙，他必須在6個月後出現並維護它。 –

是的，只包含空格的行。製表符，空格和回車符。這個答案看起來比我寫的更優雅！我唯一關心的正則表達式是它不太可讀（雖然比我的好）。 –

@ChristopherBerman：公平點;許多開發人員不熟悉正則表達式。 – Douglas

只是爲了好玩：好像你每行一次重新分配字符串確定。那麼編寫一個迭代器可以懶惰地評估字符串並返回每一行。例如：

IEnumerable<string> BreakIntoLines(string theWholeThing) 
{ 
    int startIndex = 0; 
    int endIndex = 0; 
    for(;;) 
    { 
     endIndex = theWholeThing.IndexOf(Environment.NewLine,startIndex) + Environment.NewLine.Count; //Remember to pick up the newline character(s) too! 
     if(endIndex = -1) //Didn't find a newline 
     { 
      //Return the end part of the string and finish 
      yield return theWholeThing.SubString(startIndex); 
      yield break; 
     } 
     else //Found a newline 
     { 
      //Return where we're at up to the newline 
      yield return theWholeThing.SubString(startIndex, endIndex - startIndex); 
      startIndex = endIndex; 
     } 
    } 
}

然後，您可以包裝在一個又一個，只有返回你所關心的線，並放棄其他人的迭代器。

IEnumerable<string> GetSubsectionLines(string theWholeThing, string subsectionTitle) 
{ 
    bool foundSubsectionTitle = false; 
    foreach(var line in BreakIntoLines(theWholeThing)) 
    { 
     if(line.Contains(subSectionTitle)) 
     { 
      foundSubsectionTitle = true; //Start capturing 
     } 

     if(foundSubsectionTitle) 
     { 
      yield return line; 
     } //Implicit "else" - Just discard the line if we haven't found the subsection title yet 

     if(String.IsNullOrWhiteSpace(line)) 
     { 
      //This will stop iterating after returning the empty line, if there is one 
      yield break; 
     } 
    } 
}

現在，這種方法（以及其他一些發佈的）並不完全代表您的原始代碼。例如，如果子標題中的文字恰好跨越一條線，則不會找到它。我們假設規範是以不允許的方式編寫的。這段代碼也會生成原始代碼也會返回的每一行代碼的副本，所以這可能是好的。

這樣做vs string.split的唯一好處是，當你完成返回SubSection時，字符串的其餘部分不會被評估。對於大小合理的字符串，你可能不在乎。任何「績效收益」都可能不存在。如果你真的關心性能，你不會首先複製每一行！

你得到的其他東西（實際上可能是有價值的）是代碼重用。如果你正在編寫一個解析文檔的程序，那麼能夠在各條線上進行操作可能會很有幫助。

來源

2012-12-04 01:47:09

太棒了。雖然在這個特定的應用程序中性能對我來說並不重要（完全被文件I/O所掩蓋），但這對於我們來說很有用。我正在閱讀惰性評估和迭代器。謝謝皮特！ –

查找字符串中空白行的索引

回答

相關問題