我必須從現有網站抓取一些產品數據以放入數據庫。數據全部採用HTML表格格式,模型號碼是唯一的,但每個產品可以有任意數量的不同屬性(所以我需要解析的表格都有不同的列和標題)。將HTML表格解析爲CSV的最佳方法
<table>
<tr>
<td>Model No.</td>
<td>Weight</td>
<td>Colour</td>
<td>Etc..</td>
</tr>
<tr>
<td>8572</td>
<td>12 Kg</td>
<td>Red</td>
<td>Blah..</td>
</tr>
<tr>
<td>7463</td>
<td>7 Kg</td>
<td>Blue</td>
<td>Blah..</td>
</tr>
<tr>
<td>8332</td>
<td>42 Kg</td>
<td>Yellow</td>
<td>Blah..</td>
</tr>
</table>
這就是我要找的CSV格式輸出:
Model-No,Attribute-Name,Attribute-Value
8572,"Weight","12 Kg"
8572,"Colour","Red"
8572,"Etc","Blah.."
7463,"Weight","7 Kg"
7463,"Colour","Blue"
7463,"Etc","Blah.."
8332,"Weight","42 Kg"
8332,"Colour","Yellow"
8332,"Etc","Blah.."
由於表似乎都符合XHTML我可能會加載每一個到一個XmlDocument,但沒有任何人有任何更好的方式來實現這一目標的建議?謝謝。
WinForm或WebApp? – 2011-06-15 10:43:31
@Ash - WebApp - 我只是在將html表字符串轉換爲csv字符串後的代碼儘管 – Nick 2011-06-15 10:48:54
請看這裏http://www.codeproject.com/Tips/142467/Convert-HTMLTable-to-Comma- Separated-Values – 2015-05-28 01:06:42