解析文本

-2

我有大約27000以下標記的條目在傳統的文本文件：解析文本

<li class="active-result group-option" data-option-array-index="4">Microsoft Power BI</li>

我從上面唯一需要的是（在這種情況下）

Microsoft Power BI

使用C＃，我試過字符串拆分選項（從名爲select.txt的文件讀取），但是，我還沒有設法完成這個t問。有任何想法嗎？

來源

2016-07-06 Jony

條目行分開嗎？ – Adam

都是類名嗎？ – prospector

@prospector是的，所有的同一類。唯一不同的是文本本身，以及'data-option-array-index'。 – Jony

-1

做這樣一個小事情的最理想的方式是正則表達式。

在你的文件的頂部添加：

using System.Text.RegularExpressions;

然後用這個正則表達式來捕獲所有你想要

string input = ReadSomethingFromFile(); // input is the raw data you are trying to read 
MatchCollection matches = Regex.Matches(input, "<li class=\"active-result group-option\"[^<]+>([^<]+)</li>"); 

// Loop through all matched elements 
forEach(Match m in matches) { 
    string capturedString = m.Captures[0].Value; 
    // Do something with capturedString 
}

如果你打算在你的程序在以後添加更多的功能價值，你應該使用一個合適的html解析庫。但是如果你只想做這件事，那麼正則表達式是最簡單的選擇。

來源

2016-07-06 18:20:09 Kilves

我很感謝你的回答！但是，輸出是「

Microsoft Power BI

'。 – Jony

對於27,000個條目，我不認爲正則表達式是正確的解決方案。 – jdweng

@jdweng不一定，如果這是一次性分析。 27 000個條目仍然可行。 – Kilves

我知道有人會給我負面的使用XML讀取HTML。但在這種情況下，它很好地工作。

using System; 
using System.Collections.Generic; 
using System.Linq; 
using System.Text; 
using System.Xml; 
using System.Xml.Linq; 
using System.IO; 

namespace ConsoleApplication2 
{ 
    class Program 
    { 
     static void Main(string[] args) 
     { 
      string text = "<li class=\"active-result group-option\" data-option-array-index=\"4\">Microsoft Power BI</li>"; 
      //use only for reading from string. 
      StringReader reader = new StringReader(text); 

      List<string> data = new List<string>(); 
      //for reading from file use XmlReader.Create(filename); 
      XmlReader xReader = XmlReader.Create(reader); 
      while(!xReader.EOF) 
      { 
       if(xReader.Name != "li") 
       { 
        xReader.ReadToFollowing("li"); 
       } 
       if(!xReader.EOF) 
       { 
        data.Add(xReader.ReadInnerXml()); 
       } 
      } 
     } 
    } 
}

來源

2016-07-06 17:57:37 jdweng

我會給你一個upvote，但是，我的帖子downvoted，因此我不能這樣做。抱歉! – Jony

那麼我給你一個投票（從-3到-2）。 – jdweng

回答

相關問題