我試圖提取2評論之間的HTML的一部分。正則表達式代碼提取html之間的2評論vb.net不工作
這裏是測試代碼:
Sub Main()
Dim base_dir As String = "D:\"
Dim test_file As String = base_dir & "72.htm"
Dim start_comment As String = "<!-- start of content -->"
Dim end_comment As String = "<!-- end of content -->"
Dim regex_pattern As String = start_comment & ".*" & end_comment
Dim input_text As String = start_comment & "some more html text" & end_comment
Dim match As Match = Regex.Match(input_text, regex_pattern)
If match.Success Then
Console.WriteLine("found {0}", match.Value)
Else
Console.WriteLine("not found")
End If
Console.ReadLine()
End Sub
上述作品。
當我嘗試從磁盤加載實際數據時,下面的代碼失敗。
Sub Main()
Dim base_dir As String = "D:\"
Dim test_file As String = base_dir & "72.htm"
Dim start_comment As String = "<!-- start of content -->"
Dim end_comment As String = "<!-- end of content -->"
Dim regex_pattern As String = start_comment & ".*" & end_comment
Dim input_text As String = System.IO.File.ReadAllText(test_file).Replace(vbCrLf, "")
Dim match As Match = Regex.Match(input_text, regex_pattern)
If match.Success Then
Console.WriteLine("found {0}", match.Value)
Else
Console.WriteLine("not found")
End If
Console.ReadLine()
End Sub
該HTML文件包含開始和結束註釋以及大量的中間HTML。 HTML文件中的某些內容是阿拉伯語。
感謝和問候。
http://stackoverflow.com/a/1732454/284240 – 2012-04-07 00:45:07