正則表達式代碼提取html之間的2評論vb.net不工作

我試圖提取2評論之間的HTML的一部分。正則表達式代碼提取html之間的2評論vb.net不工作

這裏是測試代碼：

Sub Main() 

    Dim base_dir As String = "D:\" 
    Dim test_file As String = base_dir & "72.htm" 

    Dim start_comment As String = "<!-- start of content -->" 
    Dim end_comment As String = "<!-- end of content -->" 

    Dim regex_pattern As String = start_comment & ".*" & end_comment 
    Dim input_text As String = start_comment & "some more html text" & end_comment 

    Dim match As Match = Regex.Match(input_text, regex_pattern) 


    If match.Success Then 
     Console.WriteLine("found {0}", match.Value) 
    Else 
     Console.WriteLine("not found") 
    End If 

    Console.ReadLine() 

End Sub

上述作品。

當我嘗試從磁盤加載實際數據時，下面的代碼失敗。

Sub Main() 

    Dim base_dir As String = "D:\" 
    Dim test_file As String = base_dir & "72.htm" 

    Dim start_comment As String = "<!-- start of content -->" 
    Dim end_comment As String = "<!-- end of content -->" 

    Dim regex_pattern As String = start_comment & ".*" & end_comment 
    Dim input_text As String = System.IO.File.ReadAllText(test_file).Replace(vbCrLf, "") 

    Dim match As Match = Regex.Match(input_text, regex_pattern) 


    If match.Success Then 
     Console.WriteLine("found {0}", match.Value) 
    Else 
     Console.WriteLine("not found") 
    End If 

    Console.ReadLine() 

End Sub

該HTML文件包含開始和結束註釋以及大量的中間HTML。 HTML文件中的某些內容是阿拉伯語。

感謝和問候。

來源

2012-04-07 MoizNgp

http://stackoverflow.com/a/1732454/284240 – 2012-04-07 00:45:07

嘗試傳遞RegexOptions.Singleline爲Regex.Match(...)這樣的：

Dim match As Match = Regex.Match(input_text, regex_pattern, RegexOptions.Singleline)

這將使點的.匹配換行符。

來源

2012-04-07 00:51:30 Robbie

謝謝，這對我工作。 – MoizNgp 2012-04-07 01:03:14

我不知道vb.net，但.匹配換行符還是有一個選項，你必須爲此設置？考慮使用[\s\S]而不是.來包含換行符。

來源

2012-04-07 00:34:57

正則表達式代碼提取html之間的2評論vb.net不工作

回答

相關問題