2017-08-17 27 views
-1

新的正則表達式,我想在我的HTML下面的文本,並想用別的東西來代替提取ID和`例HTML`取代一切

示例HTML:

{{Object id='foo'}} 

提取ID爲這樣的變量:

string strId = "foo"; 

到目前爲止,我有以下的正則表達式的代碼,將捕獲的示例HTML:

string strStart = "Object"; 
string strFind = "{{(" + strStart + ".*?)}}"; 
Regex regExp = new Regex(strFind, RegexOptions.IgnoreCase); 

Match matchRegExp = regExp.Match(html); 

while (matchRegExp.Success) 
{ 

    //At this point, I have this variable: 
    //{{Object id='foo'}} 

    //I can find the id='foo' (see below) 
    //but not sure how to extract 'foo' and use it 

    string strFindInner = "id='(.*?)'"; //"{{Slider"; 
    Regex regExpInner = new Regex(strFindInner, RegexOptions.IgnoreCase); 
    Match matchRegExpInner = regExpInner.Match(matchRegExp.Value.ToString()); 

    //Do something with 'foo' 

    matchRegExp = matchRegExp.NextMatch(); 
} 

我理解這可能是一個簡單的解決方案,我希望能獲得更多的知識有關正則表達式,但更重要的是,我希望能收到關於如何處理這種更清潔,更有效的建議。

謝謝

編輯:

這是我可能會用一個例子:c# regex replace

+0

停!一邊看一邊聽!每天都有人以用正則表達式解析Html的好主意醒來。 Nothing Parse Html比Xml解析器更好。 雖然你問你的問題的方式可能隱藏有多難!使用'{{''而不是'<>'可以隱藏解析像「> _ <<3 I luv you => _o /」這樣的註釋的事實,可以將你的正則表達式變成惡夢。 在你的頭正則表達式是一個簡單的「尋找這個」它不是!解析html正則表達式必須進行recusive,並且每次都重新開始。使用解析器和你的代碼將會很簡單,就像在js中一樣。 –

+0

謝謝,我重視您的意見,RegEx似乎是簡單的方法,但似乎不是。我試圖進入'SubString'和'IndexOf',因爲我試圖做一些類似於WordPress的doShortCode()完成的事情,並能夠找到關於當前如何工作的文檔。我期待得到一個概念證明,並從那裏繼續前進。 – Derek

+0

使用Html解析器作爲[Html Agility Pack(HAP)](http://html-agility-pack.net/?z=codeplex)。一個簡單的nuget和bim你可以在html中選擇你想要的任何東西。學習沒有什麼東西可以學習,這並不難。 –

回答

0

雖然我沒有解決我的正則表達式最初的問題,我沒有移動到一個簡單的解決方案暫時使用SubStringIndexOfstring.Split,我知道我的代碼需要清理,但我認爲我會公佈迄今爲止的答案。

string html = "<p>Start of Example</p>{{Object id='foo'}}<p>End of example</p>" 
string strObject = "Slider"; //Example 

//When found, this will contain "{{Object id='foo'}}" 
string strCode = ""; 

//ie: "id='foo'" 
string strCodeInner = ""; 

//Tags will be a list, but in this example, only "id='foo'" 
string[] tags = { }; 

//Looking for the following "{{Object " 
string strFindStart = "{{" + strObject + " "; 
int intFindStart = html.IndexOf(strFindStart); 

//Then ending in the following 
string strFindEnd = "}}"; 
int intFindEnd = html.IndexOf(strFindEnd) + strFindEnd.Length; 

//Must find both Start and End conditions 
if (intFindStart != -1 && intFindEnd != -1) 
{ 
    strCode = html.Substring(intFindStart, intFindEnd - intFindStart); 

    //Remove Start and End 
    strCodeInner = strCode.Replace(strFindStart, "").Replace(strFindEnd, ""); 

    //Split by spaces, this needs to be improved if more than IDs are to be used 
    //but for proof of concept this is perfect 
    tags = strCodeInner.Split(new char[] { ' ' }); 
} 

Dictionary<string, string> dictTags = new Dictionary<string, string>(); 
foreach (string tag in tags) 
{ 
    string[] tagSplit = tag.Split(new char[] { '=' }); 
    dictTags.Add(tagSplit[0], tagSplit[1].Replace("'", "").Replace("\"", "")); 
} 

//At this point, I can replace "{{Object id='foo'}}" with anything I'd like 
//What I don't show is that I go into the website's database, 
//get the object (ie: Slider) and return the html for slider with the ID of foo 
html = html.Replace(strCode, strView); 

/* 
    "html" variable may contain: 

    <p>Start of Example</p> 
    <p id="foo">This is the replacement text</p> 
    <p>End of example</p> 

*/