首先,我使用下面的函數從pdf文件中讀取數據。從大字符串中提取數據
public string ReadPdfFile(string fileName)
{
StringBuilder text = new StringBuilder();
if (File.Exists(fileName))
{
PdfReader pdfReader = new PdfReader(fileName);
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
text.Append(currentText);
pdfReader.Close();
}
}
return text.ToString();
}
正如你所看到的,所有的數據都保存在一個字符串中。該字符串看起來像這樣:
label1: data1;
label2: data2;
label3: data3;
.............
labeln: datan;
我的問題:如何從基於標籤的字符串獲取數據? 我想這一點,但我卡住:
if (string.Contains("label1"))
{
extracted_data1 = string.Substring(string.IndexOf(':') , string.IndexOf(';') - string.IndexOf(':') - 1);
}
if (string.Contains("label2"))
{
extracted_data2 = string.Substring(string.IndexOf("label2") + string.IndexOf(':') , string.IndexOf(';') - string.IndexOf(':') - 1);
}