2013-05-26 33 views
1

下面輸入文件從文本文件中提取一些特定的結果在C#

輸入文件

a 00002098 0 0.75 unable#1 (usually followed by `to') not having the necessary means or skill or know-how; "unable to get to town without a car"; "unable to obtain funds" 
a 00002312 0.23 0.43 dorsal#2 abaxial#1 facing away from the axis of an organ or organism; "the abaxial surface of a leaf is the underside or side facing away from the stem" 
a 00023655 0 0.5 outside#10 away#3 able#2 (of a baseball pitch) on the far side of home plate from the batter; "the pitch was away (or wide)"; "an outside pitch"  

而且我想這個文件
輸出

a,00002098,0,0.75,unable#1 
a,00002312,0.23,0.43,dorsal#2 
a,00002312,0.23,0.43,abaxial#1  
a,00023655,0, 0.5,outside#10  
a,00023655,0, 0.5,away#3 
a,00023655,0, 0.5,able#2  
以下結果

我寫下面的代碼提取這樣的上述結果

TextWriter tw = new StreamWriter("D:\\output.txt"); 

     private void button1_Click(object sender, EventArgs e) 
     { 
      if (textBox1.Text != null) 
      { 
       StreamReader reader = new StreamReader(@"C:\Users\Zia\Desktop\input.txt"); 
       string line; 
       String lines = ""; 
       while ((line = reader.ReadLine()) != null) 
       { 
        String[] str = line.Split('\t'); 
        String[] words = str[3].Split(' '); 
        for (int k = 0; k < words.Length; k++) 
        { 
         for (int i = 0; i < str.Length; i++) 
         { 
          if (i + 1 != str.Length) 
          { 
           lines = lines + str[i] + ","; 
          } 
          else 
          { 
           lines = lines + words[k] + "\r\n"; 
          } 
         } 
        } 
       } 
       tw.Write(lines); 
       tw.Close(); 
       reader.Close(); 
      } 
     }  

當我改變索引,該代碼提供了以下錯誤,而不是給出的慾望的結果。
錯誤
索引超出了數組的範圍。
在此先感謝。

+0

當你通過代碼進行調試,當異常發生時,你檢查了你正在使用的索引和數組的大小嗎? – Oded

+0

@Oded:我檢查所有索引表格2至5,但給出相同的結果: –

+0

'if(i + 1!= str.Length)'是可疑的。如果'i == str.Length'會發生什麼? – Oded

回答

2

爲什麼不試試這個算法,循環文本中的每一行:

var elements = line.Split('\t'); 
var words = elements[4].Split(' '); 
foreach(var word in words) 
{ 
    Console.WriteLine(string.Concat(elements[0], ",", elements[1], ",", elements[2], ",", elements[3], ",", word)); 
} 

這似乎輸出正是你需要的。只需更改Console.WriteLine以寫入您的文件。

+0

先生你右,但在4個選項卡後有2個或更多的可能性會重複...先生你仔細閱讀我的問題,我在哪裏提到我想要什麼.. –

+0

我沒有看到重複標籤,你是談論。考慮到您的輸入,可以輸出您需要的內容(在控制檯應用程序中測試它)。如果你有一個更復雜的輸入不能用這個算法,那麼請張貼它,如果需要的話我會修改我的帖子。 –

+0

這是否回答了您的問題? –

1

我知道你是想每個單詞(最後一列)包含#應該是作爲一個新的結果行 所以應該像

 List<string> result = new List<string>(); 

     var lines = str.Split('\n'); 
     foreach (var line in lines) 
     { 
      var words = line.Split('\t'); 
      string res = String.Format("{1}{0}{2}{0}{3}{0}{4}", ",", words[0], words[1], words[2], words[3]); 

      var xx = words[4].Split(' ').Where(word => word.Contains("#")); 
      foreach (var s in xx) 
      { 
       result.Add(String.Format(res + "," + s)); 
      } 
     } 
+0

這是否回答您的問題? – Mzf

0
 private void extrcat() 
     { 
      char[] delimiters = new char[] { '\r', '\n' }; 
      using (StreamReader reader = new StreamReader(@"C:\Users\Zia\Desktop\input.txt")) 
      { 
       string words = reader.ReadToEnd(); 
       string[] lines = words.Split(delimiters); 
       foreach (var item in lines) 
       { 
        foreach (var i in findItems(item)) 
        { 
         if (i != " ") 
          Console.WriteLine(i); 
        } 
       } 

      } 

     } 
     private static List<string> findItems(string item) 
     { 
      List<string> items = new List<string>(); 

      if (item.Length <= 0) 
      { 
       items.Add(" "); 
      } 
      else 
      { 
       List<string> names = new List<string>(); 
       string temp = item.Substring(0, item.IndexOf("#") + 2); 
       temp = temp.Replace("\t", ","); 
       temp = temp.Replace("\\t", ","); 


       items.Add(temp); 
       names = item.Split(' ').Where(x => x.Contains('#')).ToList(); 
       int i = 1; 
       while (i < names.Count) 
       { 
        temp = items[0].Substring(0, items[0].LastIndexOf(',')+1) + names[i]; 
        items.Add(temp); 
        i++; 
       } 
      } 

      return items; 

     } 

enter image description here

相關問題