從大字符串中提取數據

首先，我使用下面的函數從pdf文件中讀取數據。從大字符串中提取數據

public string ReadPdfFile(string fileName) 
    { 
     StringBuilder text = new StringBuilder(); 

     if (File.Exists(fileName)) 
     { 
      PdfReader pdfReader = new PdfReader(fileName); 

      for (int page = 1; page <= pdfReader.NumberOfPages; page++) 
      { 
       ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy(); 
       string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy); 

       currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText))); 
       text.Append(currentText); 
       pdfReader.Close(); 
      } 
     } 
     return text.ToString(); 
    }

正如你所看到的，所有的數據都保存在一個字符串中。該字符串看起來像這樣：

label1: data1; 
label2: data2; 
label3: data3; 
............. 
labeln: datan;

我的問題：如何從基於標籤的字符串獲取數據？我想這一點，但我卡住：

if (string.Contains("label1")) 
    { 
     extracted_data1 = string.Substring(string.IndexOf(':') , string.IndexOf(';') - string.IndexOf(':') - 1); 
    } 
    if (string.Contains("label2")) 
    { 
     extracted_data2 = string.Substring(string.IndexOf("label2") + string.IndexOf(':') , string.IndexOf(';') - string.IndexOf(':') - 1); 
    }

來源

2012-03-15 Emil Dumbazu

看一看的String.Split() function，它tokenises基於一個字符串提供的字符數組。

例如

string[] lines = text.Split(new[] {';'}, StringSplitOptions.RemoveEmptyEntries);

現在通過陣列環和分割每一個再次

foreach(string line in lines) { 
     string[] pair = line.Split(new[] {':'}); 
     string key = pair[0].Trim(); 
     string val = pair[1].Trim(); 
     .... 
    }

顯然檢查空行，並使用.Trim()在需要的地方...

[編輯] 或可替代地作爲一個不錯的Linq聲明...

var result = from line in text.Split(new[] {';'}, StringSplitOptions.RemoveEmptyEntries) 
      let tokens = line.Split(new[] {':'}) 
      select tokens; 

Dictionary<string, string> = 
     result.ToDictionary (key => key[0].Trim(), value => value[1].Trim());

來源

2012-03-15 10:03:38

我認爲你可以使用regex來解決這個問題。只需將分隔線上的字符串分開，並使用正則表達式來獲取正確的數字。

來源

2012-03-15 10:01:25 Frederiek

這是相當硬編碼的，但你可以使用這樣的事情（與微調您的需要的一點點）：

string input = "label1: data1;" // Example of your input 
    string data = input.Split(':')[1].Replace(";","").Trim();

來源

2012-03-15 10:03:37

您可以使用正則表達式來做到這一點：

Regex rx = new Regex("label([0-9]+): ([^;]*);"); 
var matches = rx.Matches("label1: a string; label2: another string; label100: a third string;"); 

foreach (Match match in matches) { 
    var id = match.Groups[1].ToString(); 
    var data = match.Groups[2].ToString(); 
    var idAsNumber = int.Parse(id); 

    // Here you use an array or a dictionary to save id/data 
}

來源

2012-03-15 10:04:03 xanatos

您可以通過使用Dictionary<string,string>爲此，

  Dictionary<string, string> dicLabelData = new Dictionary<string, string>(); 
      List<string> listStrSplit = new List<string>(); 
      listStrSplit = strBig.Split(';').ToList<string>();//strBig is big string which you want to parse 

      foreach (string strSplit in listStrSplit) 
      { 
       if (strSplit.Split(':').ToList<string>().Count > 1) 
       { 
        List<string> listLable = new List<string>(); 
        listLable = strSplit.Split(':').ToList<string>(); 

        dicLabelData.Add(listLable[0],listLable[1]);//Key=Label,Value=Data 
       } 
      }

dicLabelData包含所有標籤的數據....

來源

2012-03-15 10:11:50

從大字符串中提取數據

回答

相關問題