我想弄清楚以下的正則表達式,似乎無法得到正確的。有人可以提醒我嗎?正則表達式提取所有圖像和HTML
簡而言之我有一個htmlString是:
htmlString = "<HTML><HEAD></HEAD><BODY>Here are some images.</br>1) <IMG style='MARGIN-BOTTOM: 20px; MARGIN-LEFT: 20px' align=right src='images/sample001.jpg'>2) <IMG style='MARGIN-BOTTOM: 25px; MARGIN-LEFT: 25px' align=right src='images/sample002.png'></br> And some docs as well.</br>1) href='javascript:parent.POPUP({url:'testDoc001.htm',type:'shared',width:600,height:645})'></br>2) href='javascript:parent.POPUP({url:'testDoc002.html',type:'shared',width:700,height:712})'></br></BODY></HTML>";
我通過以下程序在C#中運行這個,WPF:
private static List<string> ExtractData(string htmlString)
{
List<string> data = new List<string>();
//*** Get The Images ***
string pattern = @"<img .* src='(.+\.(jpg|bmp|png))'";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(htmlString);
for (int i = 0, l = matches.Count; i < l; i++)
{
data.Add(matches[i].Value);
}
//*** Get Html Pages ***
pattern = @"url:'([^']*)'";
rgx = new Regex(pattern, RegexOptions.IgnoreCase);
matches = rgx.Matches(htmlString);
for (int i = 0, l = matches.Count; i < l; i++)
{
data.Add(matches[i].Value);
}
return data;
}--------------------------------------------------------------------------------------
,結果我得到的是:
[0] =「< IMG style ='MARGIN-BOTTOM:20px; MARGIN-LEFT:20px'align = right src ='images/sam 2)< IMG style ='MARGIN-BOTTOM:25px; MARGIN-LEFT:25像素 'ALIGN =右SRC = '圖像/ sample002.png 「'
[1] = 」URL: 'testDoc001.htm'「
[2] =」 URL:' testDoc002。 HTML」'
我真正想要的是:
[0] = 「圖像/ sample001.jpg」
[1] = 「圖像/ sample002.png」
[2] =「testDoc001.htm」
[3] =「testDoc002.html」
有人能告訴我我在正則表達式中做錯了什麼嗎?
感謝
見第一個答案在這裏:http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Cfreak 2012-04-16 14:00:42
什麼你想要的可能是通過正則表達式來實現的,但它不會像你期望的那樣整齊和整齊。你應該真的使用解析器來做到這一點。 http://stackoverflow.com/a/1732454/355724 – VeeArr 2012-04-16 14:00:58
[正則表達式獲取C#中圖像的SRC]的可能重複(http://stackoverflow.com/questions/4257359/regular-expression-to-get-所述-SRC-的圖像式-C-尖銳) – 2012-04-16 14:25:58