2014-04-22 55 views
0

我目前能夠解析並從大型製表符分隔的文件中提取數據。我正在閱讀,逐行解析和提取,並在我的數據表中添加拆分項(行限制一次添加3行)。我需要跳過偶數行,即先讀取第一個最大製表符分隔的行,然後跳過第二個,直接讀取第三個行。如何通過跳過備用行來讀取製表符分隔的行

我的製表符分隔源文件格式

001Mean     26.975     1.1403     910.45     
001Stdev     26.975     1.1403     910.45     
002Mean     26.975     1.1403     910.45     
002Stdev     26.975     1.1403     910.45     

需要跳過或避免讀取髮網製表符分隔行。

C#代碼:

通過分割線

using (var reader = new StreamReader(sourceFileFullName)) 
     { 
      string line = null; 
      line = reader.ReadToEnd(); 

      if (!string.IsNullOrEmpty(line)) 
      { 
       var list_with_max_cols = line.Split('\n').OrderByDescending(y => y.Split('\t').Count()).Take(1); 
       foreach (var value in list_with_max_cols) 
       { 
        var values = value.ToString().Split(new[] { '\t', '\n' }).ToArray(); 
        MAX_NO_OF_COLUMNS = values.Length; 
       } 
      } 
     } 

逐行讀取文件中的行,直到製表符分隔線最大長度獲取項目的最大長度在文件的製表符分隔行滿足作爲第一線來解析和提取

using (var reader = new StreamReader(sourceFileFullName)) 
     { 
      string new_read_line = null; 
      //Read and display lines from the file until the end of the file is reached.     
      while ((new_read_line = reader.ReadLine()) != null) 
      { 
          var items = new_read_line.Split(new[] { '\t', '\n' }).ToArray(); 
          if (items.Length != MAX_NO_OF_COLUMNS)       
          continue; 
       //when reach first line it is column list need to create datatable based on that. 
       if (firstLineOfFile) 
       { 

        columnData = new_read_line; 
        firstLineOfFile = false; 
        continue; 
       } 
       if (firstLineOfChunk) 
       { 
        firstLineOfChunk = false; 
        chunkDataTable = CreateEmptyDataTable(columnData); 
       } 
        AddRow(chunkDataTable, new_read_line); 
       chunkRowCount++; 

       if (chunkRowCount == _chunkRowLimit) 
       { 
        firstLineOfChunk = true; 
        chunkRowCount = 0; 
        yield return chunkDataTable; 
        chunkDataTable = null; 
       } 
      } 
     } 

創建數據表:

private DataTable CreateEmptyDataTable(string firstLine) 
    { 

     IList<string> columnList = Split(firstLine); 
     var dataTable = new DataTable("TableName"); 
     for (int columnIndex = 0; columnIndex < columnList.Count; columnIndex++) 
     { 
      string c_string = columnList[columnIndex]; 
      if (Regex.Match(c_string, "\\s").Success) 
      { 
       string tmp = Regex.Replace(c_string, "\\s", ""); 
       string finaltmp = Regex.Replace(tmp, @" ?\[.*?\]", ""); // To strip strings inside [] and inclusive [] alone 
       columnList[columnIndex] = finaltmp; 

      } 
     } 
     dataTable.Columns.AddRange(columnList.Select(v => new DataColumn(v)).ToArray()); 
     dataTable.Columns.Add("ID"); 
     return dataTable; 

    } 

How to skip lines by reading alternatively and split and then add to my datatable !!! 

AddRow功能:通過添加以下更改管理以實現我的要求!

private void AddRow(DataTable dataTable, string line) 
    { 

     if (line.Contains("Stdev")) 
     { 
      return; 
     } 
     else 
     { 
      //Rest of Code 
     } 

    } 

回答

0

變化

using (var reader = new StreamReader(sourceFileFullName)) 
    { 
     string new_read_line = null; 
     //Read and display lines from the file until the end of the file is reached.     
     while ((new_read_line = reader.ReadLine()) != null) 
     { 
         var items = new_read_line.Split(new[] { '\t', '\n' }).ToArray(); 
         if (items.Length != MAX_NO_OF_COLUMNS)       
         continue; 

using (var reader = new StreamReader(sourceFileFullName)) 
    { 

     int cnt = 0; 
     string new_read_line = null; 
     //Read and display lines from the file until the end of the file is reached.     
     while ((new_read_line = reader.ReadLine()) != null) 
     { 
         cnt++; 

         if(cnt % 2 == 0) 
          continue; 
         var items = new_read_line.Split(new[] { '\t', '\n' }).ToArray(); 
         if (items.Length != MAX_NO_OF_COLUMNS)       
         continue; 
+0

@古斯曼感謝您的意見!行我已經添加了對我的代碼的更改。我後來才意識到,使用cnt%2 == 0可能不符合我的要求,因爲stdev行可能存在於我的源文件中製表符分隔行的奇數和偶數索引中。 – Shrivatsan

2

考慮到你在每一行製表符分隔值,如何閱讀奇數行,並將其分成數組。這只是一個樣本;你可以擴展這一點。

測試數據(file.txt的)

luck is when opportunity meets preparation 
this line needs to be skipped 
microsoft visual studio 
another line to be skipped 
let us all code 

代碼

var oddLines = File.ReadLines(@"C:\projects\file.txt").Where((item, index) => index%2 == 0); 
foreach (var line in oddLines) 
{ 
    var words = line.Split('\t'); 
} 

調試屏幕截圖

Image 1

Image 2

編輯

要獲得不包含「髮網」

var filteredLines = System.IO.File.ReadLines(@"C:\projects\file.txt").Where(item => !item.Contains("Stdev")); 
+0

+1 for File.ReadLines – bitxwise

+0

@ Prashanth感謝您的意見!!!我已經添加了對我的代碼的更改。我後來意識到,使用索引%2 == 0可能不符合我的要求,因爲stdev行可能存在於我的源文件中製表符分隔行的奇數和偶數索引中。 – Shrivatsan

+0

@Shrivatsan我的回答是基於你原來的要求。無論如何,很高興知道你找到了你的修復。你仍然可以修改我的查詢來只過濾你想要的數據。看到我編輯的查詢。 –

相關問題