在正則表達式上的多行

我有一個網站的html文件，我用正則表達式來搜索單詞並將這些單詞寫入文檔。我有這樣的文字：在正則表達式上的多行

<div class="scrollable " style="height: 200px;"> 
     <div> 
      <p>CO-Schrank: nicht ben&ouml;tigtes ausbauen</p> 
<p><strong>________________________________________________________________________</strong></p> 

<p><strong>==&gt;&nbsp; wird nicht mehr ben&ouml;tigt!<br /></strong>z-B.: IUC</p> 

<p>CO-Management in Gen. 2 implementieren</p> 

<ol> 
<li>Ausbau der PCI-Karten aus ZKA-PC in CO-PC- PC-Sys 02 TP 55, 56, 61 sind noch Profibus im ZKA-PC ==&gt; in CO-PC- PC-Sys 02 greift dann auf CO-PC f&uuml;r Datenaufzeichnung =&gt; Betrieb wieder aufnehmen</li> 

<li>Ausbau der IUC</li> 

<li>Testaufbau am CO-PC f&uuml;r den CO-Algorithmus und Datenspeicherung</li> 

<li>Gen. 2 in CO-Management implementieren- pro Pr&uuml;fling 3 Min. (3 Min. x 48 HG x 10 Messungen)&nbsp;= 1440 Min. = 24 h- Messzeit 1-2 Min.</li> 

</ol> 


</div></div>

現在我想在<div>.... </div>過的所有文本。我寫了這個代碼，但它不工作：

Match description = Regex.Match(line, "^<div class=\"scrollable \"^(.*?)$div>", 
    RegexOptions.Multiline);//multiple line 

if (description.Success) 
{ 
    //Console.WriteLine(status_id.Groups[1].Value); 
    System.IO.StreamWriter file = new System.IO.StreamWriter(@"C:\\Webasto\\csv-"+zahl+".txt"); 
    file.WriteLine(id.Groups[1].Value + ";4;4;" + subject.Groups[1].Value + ";" + due_date.Groups[1].Value+";NULL;"+status_id.Groups[1].Value+";"//+assigned.Groups[1].Value 
     +";" 
     +priority.Groups[1].Value+";NULL;"+autor.Groups[1].Value+";0;"+created_on.Groups[1].Value+";"+start_date.Groups[1].Value+";"+done_ratio.Groups[1].Value+";"+hours.Groups[1].Value 
     +";NULL;"+id.Groups[1].Value+";1;2;0;"+closed.Groups[1].Value+";"); 
    file.Close(); 
}

來源

2015-12-29 Hans Sroeb

切勿使用Regex來解析XML/HTML。 –

請使用HTML解析器。 – timgeb

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags –

衆所周知，您應該使用xhtml解析器而不是正則表達式。

無論如何，你可以使用正則表達式，如果你知道什麼是你的html中使用的字符集。如果你仍然想使用正則表達式，那麼你可以使用正則表達式與單行標誌是這樣的：

(?s)<div>.*?<\/div>

Working demo

或使用正則表達式招：

<div>[\s\S]*?<\/div>

來源

2015-12-29 14:40:23

謝謝！有用！！ –

您有什麼MultiLine手段誤會（我不怪你，我有我每次使用正則表達式時要三思而後行）。 MultiLine意味着每一行（以\n結尾）都被自行處理。

您需要SingleLine，它將整個字符串視爲一行。

_{注意：使用正則表達式來解析HTML是一個壞主意。改爲使用體面的HTML解析器。}

來源

2015-12-29 13:57:50

是的。我一直認爲單線是一個可怕的名字，實際上意味着「點全部匹配」。尤其是因爲您可以同時激活單線和多線模式。 – timgeb

是的，可怕的命名約定。 –

在正則表達式上的多行

回答

相關問題