我試圖讀取供應商提供的基於HTML的.xls文件,並將其轉換爲csv以導入不同的進程。我發現了很多可以讀取和轉換的解決方案,最流行的是使用OLEDB來讀取它。我上週在VS2010中工作,但後來安裝了VS2012/.NET4.5,並且突然無法識別源文件我什麼也沒有做,可以再次得到它的功能 - 我甚至嘗試在不同的機器上安裝VS2010,它不會表現出來(所以我不知道它如何在原機上運行)。如果按原樣運行代碼,則cnn.Open()將引發一個異常,指出「外部表格未處於預期格式。」如果我將連接字符串更改爲註釋掉的行,它將讀取文件,但不正確(並非所有內容都被讀取並且數據未正確填充)。使用C#讀取基於HTML的XLS時出現的問題
因此,總之,什麼是最好的方式(最好沒有第三方庫/應用程序)閱讀本文底部的文件使用C#?
下面的代碼
string excelFilePath = @"C:\Users\Dan\test.xls";
string csvOutputFile = @"C:\Users\Dan\output.csv";
int worksheetNumber = 1;
// connection string
var cnnStr = String.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0;IMEX=1;HDR=NO\"", excelFilePath);
//var cnnStr = String.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"HTML Import;IMEX=1;HDR=NO\"", excelFilePath);
var cnn = new OleDbConnection(cnnStr);
// get schema, then data
var dt = new DataTable();
try
{
cnn.Open();
var schemaTable = cnn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
if (schemaTable.Rows.Count < worksheetNumber) throw new ArgumentException("The worksheet number provided cannot be found in the spreadsheet");
string worksheet = schemaTable.Rows[worksheetNumber - 1]["table_name"].ToString().Replace("'", "");
string sql = String.Format("select * from [{0}]", worksheet);
var da = new OleDbDataAdapter(sql, cnn);
da.Fill(dt);
}
catch (Exception e){}
finally{cnn.Close();}
// write out CSV data
using (var wtr = new StreamWriter(csvOutputFile))
{
foreach (DataRow row in dt.Rows)
{
bool firstLine = true;
foreach (DataColumn col in dt.Columns)
{
if (!firstLine) { wtr.Write(","); } else { firstLine = false; }
var data = row[col.ColumnName].ToString().Replace("\"", "\"\"");
wtr.Write(String.Format("\"{0}\"", data));
}
wtr.WriteLine();
}
}
下面是我從閱讀文件,發送給我們提供了一個.xls擴展名。
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
<meta name="ProgId" content="Excel.Sheet"/>
<meta name="Generator" content="Microsoft Excel 10"/>
<!--[if !mso]>
<style>
v\\:* {behavior:url(#default#VML);}");
o\\:* {behavior:url(#default#VML);}");
x\\:* {behavior:url(#default#VML);}");
.shape {behavior:url(#default#VML);}");
</style>");
<![endif]-->
<!--[if gte mso 9]><xml>
<x:ExcelWorkbook>
<x:ExcelWorksheets>
<x:ExcelWorksheet>
<x:Name>report</w:Name>
<x:WorksheetOptions>
<x:ProtectContents>False</w:ProtectContents>
<x:ProtectObjects>False</w:ProtectObjects>
<x:ProtectScenarios>False</w:ProtectScenarios>
</w:WorksheetOptions>
</w:ExcelWorksheet>
</w:ExcelWorksheets>
<x:ProtectStructure>False</w:ProtectStructure>
<x:ProtectWindows>False</w:ProtectWindows>
</w:ExcelWorkbook>");
</xml><![endif]-->
<head>
<style>
br {mso-data-placement:same-cell;}
</style>
</head>
<body>
<style>
table {
mso-displayed-decimal-separator:"\.";
mso-displayed-thousand-separator:"\,";
}
</style>
<table width="100%">
<tr>
<td align=center colspan=6 valign=top>
<span class="pageHead">
<nobr><h1>Status</h1></nobr></span>
</td>
</tr>
<tr>
<td align=center colspan=6 valign=top>
<span class="pageHead"><nobr>
Generated by User
</nobr></span>
</td></tr>
<tr>
<td> </td>
</tr>
<tr>
<td> </td>
</tr>
</table>
<table border="1" cellspacing="0" cellpadding="0" width="100%">
<tr>
<th>Owner</th>
<th>Project Id</th>
<th>Event Id</th>
<th>Event Title</th>
<th>Event Status</th>
<th>EventSummary</th>
</tr>
<tr>
<td>User</td>
<td>1</td>
<td>test1</td>
<td>event1</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>2</td>
<td>test2</td>
<td>event2</td>
<td>Pending Selection</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>3</td>
<td>test3</td>
<td>event3</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>4</td>
<td>test4</td>
<td>event4</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>5</td>
<td>test5</td>
<td>event5</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>6</td>
<td>test6</td>
<td>event6</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>7</td>
<td>test7</td>
<td>event7</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>8</td>
<td>test8</td>
<td>event8</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>9</td>
<td>test9</td>
<td>event9</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>10</td>
<td>test10</td>
<td>event10</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>11</td>
<td>test11</td>
<td>event11</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>12</td>
<td>test12</td>
<td>event12</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>13</td>
<td>test13</td>
<td>event13</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>14</td>
<td>test14</td>
<td>event14</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>15</td>
<td>test15</td>
<td>event15</td>
<td>Completed</td>
<td>1</td>
</tr>
</table>
</body></html>