2012-04-07 27 views
0

我試圖提取2評論之間的HTML的一部分。正則表達式代碼提取html之間的2評論vb.net不工作

這裏是測試代碼:

Sub Main() 

    Dim base_dir As String = "D:\" 
    Dim test_file As String = base_dir & "72.htm" 

    Dim start_comment As String = "<!-- start of content -->" 
    Dim end_comment As String = "<!-- end of content -->" 

    Dim regex_pattern As String = start_comment & ".*" & end_comment 
    Dim input_text As String = start_comment & "some more html text" & end_comment 

    Dim match As Match = Regex.Match(input_text, regex_pattern) 


    If match.Success Then 
     Console.WriteLine("found {0}", match.Value) 
    Else 
     Console.WriteLine("not found") 
    End If 

    Console.ReadLine() 

End Sub 

上述作品。

當我嘗試從磁盤加載實際數據時,下面的代碼失敗。

Sub Main() 

    Dim base_dir As String = "D:\" 
    Dim test_file As String = base_dir & "72.htm" 

    Dim start_comment As String = "<!-- start of content -->" 
    Dim end_comment As String = "<!-- end of content -->" 

    Dim regex_pattern As String = start_comment & ".*" & end_comment 
    Dim input_text As String = System.IO.File.ReadAllText(test_file).Replace(vbCrLf, "") 

    Dim match As Match = Regex.Match(input_text, regex_pattern) 


    If match.Success Then 
     Console.WriteLine("found {0}", match.Value) 
    Else 
     Console.WriteLine("not found") 
    End If 

    Console.ReadLine() 

End Sub 

該HTML文件包含開始和結束註釋以及大量的中間HTML。 HTML文件中的某些內容是阿拉伯語。

感謝和問候。

+0

http://stackoverflow.com/a/1732454/284240 – 2012-04-07 00:45:07

回答

2

嘗試傳遞RegexOptions.SinglelineRegex.Match(...)這樣的:

Dim match As Match = Regex.Match(input_text, regex_pattern, RegexOptions.Singleline) 

這將使點的.匹配換行符。

+0

謝謝,這對我工作。 – MoizNgp 2012-04-07 01:03:14

0

我不知道vb.net,但.匹配換行符還是有一個選項,你必須爲此設置?考慮使用[\s\S]而不是.來包含換行符。