2015-12-14 65 views
3

我正試圖從Excel文檔讀取一列。我想閱讀整個專欄,但顯然只存儲有數據的單元格。我也想嘗試處理這種情況,即列中的某個單元格爲空,但如果列中的某些內容更靠後,它將在稍後的單元格值中讀取。例如:如何從Excel電子表格中讀取單個列?

| Column1 | 
|---------| 
|bob  | 
|tom  | 
|randy | 
|travis | 
|joe  | 
|   | 
|jennifer | 
|sam  | 
|debby | 

如果我有這樣的專欄中,我不介意joe後具有行A ""價值,但我不希望它保持空白單元格後得到的值。但是,假設debby是該列中的最後一個值,我不希望它繼續運行到過去的 35000行。

假設這將始終是第一列也是安全的。

到目前爲止,我有這樣的:

Excel.Application myApplication = new Excel.Application(); 
myApplication.Visible = true; 
Excel.Workbook myWorkbook = myApplication.Workbooks.Open("C:\\aFileISelect.xlsx"); 
Excel.Worksheet myWorksheet = myWorkbook.Sheets["aSheet"] as Excel.Worksheet; 
Excel.Range myRange = myWorksheet.get_Range("A:A", Type.Missing); 

foreach (Excel.Range r in myRange) 
{ 
    MessageBox.Show(r.Text); 
} 

我發現很多從舊版本的.NET做類似的事情的例子,但不完全是這樣,並希望確保我做了這更現代化(假設用來做這件事的方法已經改變了一些數量)。

我當前的代碼讀取整個列,但在最後一個值後包含空白單元格。


EDIT1

我喜歡Isedlacek的答案下面,但我確實有它的問題,我不是一定是針對他的代碼。如果我以這種方式使用它:

Excel.Application myApplication = new Excel.Application(); 
myApplication.Visible = true; 
Excel.Workbook myWorkbook = myApplication.Workbooks.Open("C:\\aFileISelect.xlsx"); 
Excel.Worksheet myWorksheet = myWorkbook.Sheets["aSheet"] as Excel.Worksheet; 
Excel.Range myRange = myWorksheet.get_Range("A:A", Type.Missing); 

var nonEmptyRanges = myRange.Cast<Excel.Range>() 
.Where(r => !string.IsNullOrEmpty(r.Text)); 

foreach (var r in nonEmptyRanges) 
{ 
    MessageBox.Show(r.Text); 
} 

MessageBox.Show("Finished!"); 

Finished! MessageBox從不顯示。我不確定爲什麼會發生這種情況,但似乎從未真正完成搜索。我嘗試在循環中添加一個計數器,以查看它是否只是連續搜索列,但似乎並不是......似乎只是停止。

其中Finished! MessageBox是,我試圖關閉工作簿和電子表格,但該代碼從未運行(如預期的,因爲MessageBox從未運行)。

如果我手動關閉Excel電子表格,我得到一個收到COMException:

收到COMException是由用戶代碼
信息未處理:從HRESULT異常:0x803A09A2

任何想法?

+1

最後!今天的問題可以理解! – pnuts

+1

哈哈,那是目標,謝謝! – trueCamelType

+0

我的回答對你有幫助嗎?爲了解決您遇到的效率問題,我做了幾項更新。 –

回答

3

的答案取決於你是否想使用的細胞的邊界範圍,或者如果你想獲得非空值從一列。

下面介紹如何有效地從列中獲取非空值。請注意,立即讀取整個tempRange.Value屬性爲許多比讀取逐個單元更快,但折衷是結果數組可能會耗盡大量內存。

private static IEnumerable<object> GetNonNullValuesInColumn(_Application application, _Worksheet worksheet, string columnName) 
{ 
    // get the intersection of the column and the used range on the sheet (this is a superset of the non-null cells) 
    var tempRange = application.Intersect(worksheet.UsedRange, (Range) worksheet.Columns[columnName]); 

    // if there is no intersection, there are no values in the column 
    if (tempRange == null) 
     yield break; 

    // get complete set of values from the temp range (potentially memory-intensive) 
    var value = tempRange.Value2; 

    // if value is NULL, it's a single cell with no value 
    if (value == null) 
     yield break; 

    // if value is not an array, the temp range was a single cell with a value 
    if (!(value is Array)) 
    { 
     yield return value; 
     yield break; 
    } 

    // otherwise, the value is a 2-D array 
    var value2 = (object[,]) value; 
    var rowCount = value2.GetLength(0); 
    for (var row = 1; row <= rowCount; ++row) 
    { 
     var v = value2[row, 1]; 
     if (v != null) 
      yield return v; 
    } 
} 

這裏有一個有效的方法來獲得列中包含非空單元格的最小範圍。請注意,我仍然一次讀取整個tempRange值,然後使用結果數組(如果是多單元格範圍)來確定哪些單元格包含第一個和最後一個值。然後我在計算出哪些行有數據之後構建了邊界範圍。

private static Range GetNonEmptyRangeInColumn(_Application application, _Worksheet worksheet, string columnName) 
{ 
    // get the intersection of the column and the used range on the sheet (this is a superset of the non-null cells) 
    var tempRange = application.Intersect(worksheet.UsedRange, (Range) worksheet.Columns[columnName]); 

    // if there is no intersection, there are no values in the column 
    if (tempRange == null) 
     return null; 

    // get complete set of values from the temp range (potentially memory-intensive) 
    var value = tempRange.Value2; 

    // if value is NULL, it's a single cell with no value 
    if (value == null) 
     return null; 

    // if value is not an array, the temp range was a single cell with a value 
    if (!(value is Array)) 
     return tempRange; 

    // otherwise, the temp range is a 2D array which may have leading or trailing empty cells 
    var value2 = (object[,]) value; 

    // get the first and last rows that contain values 
    var rowCount = value2.GetLength(0); 
    int firstRowIndex; 
    for (firstRowIndex = 1; firstRowIndex <= rowCount; ++firstRowIndex) 
    { 
     if (value2[firstRowIndex, 1] != null) 
      break; 
    } 
    int lastRowIndex; 
    for (lastRowIndex = rowCount; lastRowIndex >= firstRowIndex; --lastRowIndex) 
    { 
     if (value2[lastRowIndex, 1] != null) 
      break; 
    } 

    // if there are no first and last used row, there is no used range in the column 
    if (firstRowIndex > lastRowIndex) 
     return null; 

    // return the range 
    return worksheet.Range[tempRange[firstRowIndex, 1], tempRange[lastRowIndex, 1]]; 
} 
+0

謝謝,如果我最終有時間了,我將把它移交給C#並將其添加爲我的問題的編輯。我認爲像這樣的東西會對發現這個問題的人有幫助。 – trueCamelType

+0

我只是做了一些編輯。我相信我已經完成了編輯。最終答案! –

1

如果你不介意完全失去了空行:

var nonEmptyRanges = myRange.Cast<Excel.Range>() 
    .Where(r => !string.IsNullOrEmpty(r.Text)) 
foreach (var r in nonEmptyRanges) 
{ 
    // handle the r 
    MessageBox.Show(r.Text); 
} 
+0

這完全回答了這個問題,謝謝。保留空行不是必需的。 – trueCamelType

+0

我對此代碼有疑問。如果我使用它,它似乎永遠不會完成。我會在我的問題中添加一個編輯來解釋我的意思。 – trueCamelType

+0

這個不完成的原因是因爲它正在評估工作表中的每個單元格(或列中的每個單元格,依賴於'myRange')。在Excel 2007+中,每列有1,048,576個單元格。 Excel互操作速度非常慢。這就是爲什麼你需要採用我的答案中的技術來限制你評估的細胞數量。 –

0
/// <summary> 
    /// Generic method which reads a column from the <paramref name="workSheetToReadFrom"/> sheet provided.<para /> 
    /// The <paramref name="dumpVariable"/> is the variable upon which the column to be read is going to be dumped.<para /> 
    /// The <paramref name="workSheetToReadFrom"/> is the sheet from which te column is going to be read.<para /> 
    /// The <paramref name="initialCellRowIndex"/>, <paramref name="finalCellRowIndex"/> and <paramref name="columnIndex"/> specify the length of the list to be read and the concrete column of the file from which to perform the reading. <para /> 
    /// Note that the type of data which is going to be read needs to be specified as a generic type argument.The method constraints the generic type arguments which can be passed to it to the types which implement the IConvertible interface provided by the framework (e.g. int, double, string, etc.). 
    /// </summary> 
    /// <typeparam name="T"></typeparam> 
    /// <param name="dumpVariable"></param> 
    /// <param name="workSheetToReadFrom"></param> 
    /// <param name="initialCellRowIndex"></param> 
    /// <param name="finalCellRowIndex"></param> 
    /// <param name="columnIndex"></param> 
    static void ReadExcelColumn<T>(ref List<T> dumpVariable, Excel._Worksheet workSheetToReadFrom, int initialCellRowIndex, int finalCellRowIndex, int columnIndex) where T: IConvertible 
    { 
     dumpVariable = ((object[,])workSheetToReadFrom.Range[workSheetToReadFrom.Cells[initialCellRowIndex, columnIndex], workSheetToReadFrom.Cells[finalCellRowIndex, columnIndex]].Value2).Cast<object>().ToList().ConvertAll(e => (T)Convert.ChangeType(e, typeof(T))); 
    } 
相關問題