2017-02-27 75 views
0

我正在尋找最有效的方式來接受字符串和令牌,將它蛻化爲一個數組,將所有HTML標記組分隔開來。Tokenize或將字符串拆分爲文本和Html標記項

Example Input (String): 
    "I can format my text so that <strong>This is bold</strong> and this is not." 

Desired Output (String[] array): 
    "I can format my text so that", 
    "<strong>", 
    "This is bold", 
    "</strong>", 
    "and this is not." 

Alternate Output Just As Good(String[] array): 
    "I", 
    "can", 
    "format", 
    "my", 
    "text", 
    "so", 
    "that", 
    "<strong>", 
    "This", 
    "is", 
    "bold", 
    "</strong>", 
    "and", 
    "this", 
    "is", 
    "not." 

我不確定如何解決此問題的最佳方法。任何幫助,將不勝感激。

+0

'Regex.Split(inputString, 「(<=>)|(= <)?」);' –

+0

使用正則表達式' .Split(s,@「(<[^<]*?>)」)' –

回答

0

可以使用Regex.Split()了一套零長度斷言通過>的地方,然後<或之前分裂:

string input = "I can format my text so that <strong>This is bold</strong> and this is not."; 
string[] output = Regex.Split(input, "(?=<)|(?<=>)"); 

(?=pattern)被稱爲前瞻斷言,確保pattern如下。
(?<=pattern)是向後看斷言,相同的概念,但前看着字符位置