2015-03-31 69 views
0

正在嘗試更正格式不正確的HTML表格。我無法控制源代碼,我的應用程序只是將下載文件的內容作爲常規文本文件加載。文件內容是一個簡單的HTML表格,缺少關閉</tr>元素。我試圖拆分<tr>上的內容以獲得一個數組,我可以將</tr>拖到需要它的元素的末尾。當我嘗試使用fleContents.Split("<tr>").ToList分割字符串時,我在得到的List(Of String)中得到了比應該更多的元素。String.Split返回錯誤的數組

在這裏,我一個短小的測試代碼,顯示了相同的行爲:

Dim testSource As String = "<table><tr><td>8172745</td><tr><td>8172745</td></table>" 
Dim testArr As String() = testSource.Split("<tr>") 

'Maybe try splitting on a variable because you can't use a string literal containging "<>" in the Split method 
Dim seper as String = "<tr>" 
testArr As String() = testSource.Split(seper) 

'feed it a new string directly 
testArr = testSource .Split(New String("<tr>")) 

我預計testArr應包含3個元素,如下所示:

  1. "<table>"
  2. "<td>8172745</td>"
  3. "<td>8172745</td></table>"

然而,我收到以下的數組:

  1. ""
  2. "table>"
  3. "tr>"
  4. "td>8172745"
  5. "/td>"
  6. "tr>"
  7. "td>8172954"
  8. "/td>"
  9. "/table>"

有人可以請解釋爲什麼字符串被拆成這個樣子,我怎麼能去獲得我期待的結果?

回答

1

比你希望你的代碼使用的是Split方法的不同過載。你要接受一個String[]StringSplitOptions參數的方法:

Dim testSource As String = "<table><tr><td>8172745</td><tr><td>8172745</td></table>" 
Dim delimeter As String() = { "<tr>" } 
Dim testArr As String() = _ 
    testSource.Split(delimeter, StringSplitOptions.RemoveEmptyEntries) 

你可以看到它在IDEOne工作:

http://ideone.com/pcw6aq

1

嘗試使用正則表達式像

Imports System.Text.RegularExpressions 

Public Class Form1 


    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click 
     Dim testSource As String = "<table><tr><td>8172745</td><tr><td>8172745</td></table>" 
     Dim testArr As String() = Regex.Split(testSource, "<tr>") 

     'Show The Array in TextBox1 
     TextBox1.Lines = testArr 

    End Sub 
End Class 

萬事如意