使用openpyxl作爲timedate讀取的浮點數值

我有一個Excel電子表格，它包含一個包含小％f.2值（如1.2,1.07,2.3等）的字段，由於某些原因，openpyxl正在將這些單元格讀取爲1900日期。我已經看到這個問題已經被提出了很多次，但是通常這些用戶期待一個日期並且正在得到一個虛假的日期。我期待一個值，通常是x < 10.0，並且我得到了大約30-40％'壞'數據（讀作timedate），而另一次讀取爲數值。使用openpyxl作爲timedate讀取的浮點數值

我使用迭代器，所以我做了一次簡單的ws.iter_rows（）調用來將數據拉到一行。我試圖將它「轉換」爲以前創建的包含數值的變量，但這並沒有太大的好處。

有沒有人有關於如何克服這個零星問題的建議。如果這是一個已知的錯誤，是否有任何已知的解決方法？

我發現如果我將文件存儲爲csv，並將其重新打開爲csv，然後將其重新存儲爲xlsx，那麼我將最終生成一個可以正確讀取的文件。雖然這有助於調試代碼，但我需要一個客戶可以使用的解決方案，而無需跳過這些環節。

我會認爲，如果列未被正確格式化，它將適用於所有元素，因此間歇性地發生這種混淆。

import openpyxl 
from openpyxl import load_workbook 

# Source workbook - wb 

wb = load_workbook(filename = r'C:\data\TEST.xlsx' , use_iterators = True) 
ws = wb.get_sheet_by_name(name ='QuoteFile ') 

for row in ws.iter_rows(): 
     print(row[0].internal_value ,row[3].internal_value ,row[4].internal_value   ,row[5].internal_value) 


print('Done')

這裏是我的輸入從Excel表格

20015 2.13 1.2 08/01/11 
20015 5.03 1.2 08/01/11 
20015 5.03 1.2 08/01/11 
20015 5.51 1.2 08/01/11 
20015 8.13 1.2 08/01/11 
20015 5.60 1.2 08/01/11 
20015 5.03 1.2 08/01/11 
20015 1.50 1.2 08/01/11 
20015 1.50 1.2 08/01/11 
20015 1.50 1.2 08/01/11 
20015 1.50 1.2 08/01/11 
20015 1.50 1.2 08/01/11 
20015 1.50 1.2 08/01/11

這裏是我的輸出，你可以看到前七行表示第二場從1900年迄今看到的，而行8- 13正確顯示字段作爲數字字段：

20015.0 1900-01-02 03:07:12 1.2 2011-08-01 00:00:00 
20015.0 1900-01-05 00:43:12 1.2 2011-08-01 00:00:00 
20015.0 1900-01-05 00:43:12 1.2 2011-08-01 00:00:00 
20015.0 1900-01-05 12:14:24 1.2 2011-08-01 00:00:00 
20015.0 1900-01-08 03:07:12 1.2 2011-08-01 00:00:00 
20015.0 1900-01-05 14:24:00 1.2 2011-08-01 00:00:00 
20015.0 1900-01-05 00:43:12 1.2 2011-08-01 00:00:00 
20015.0 1.5 1.2 2011-08-01 00:00:00 
20015.0 1.5 1.2 2011-08-01 00:00:00 
20015.0 1.5 1.2 2011-08-01 00:00:00 
20015.0 1.5 1.2 2011-08-01 00:00:00 
20015.0 1.5 1.2 2011-08-01 00:00:00 
20015.0 1.5 1.2 2011-08-01 00:00:00

使用python 3.3和1.6.2 openpyxl

來源

2013-06-06 Joel

免責聲明：我不知道如何工作openpyxl。但是，我對日期時間模塊非常好。

如果你知道哪些行應該是數字，我有一個單線程lambda函數，它將Excel日期格式轉換爲浮點數，如果它是數字，則忽略它。

我們可以以這樣的方式使用此代碼：

import datetime 
import openpyxl 
from openpyxl import load_workbook 

# Source workbook - wb 

wb = load_workbook(filename = r'C:\data\TEST.xlsx' , use_iterators = True) 
ws = wb.get_sheet_by_name(name ='QuoteFile ') 

# Quick explanation: 
# If it's a number, return it. Otherwise, take the difference between the datetime 
# and 1899-12-31 00:00:00. The way the datetimes work is they're internally a float, 
# being the number of days since the start of 1900. We get the number of seconds in 
# the delta (done through subtraction) and divide that by 86400 (the number of seconds 
# in a day). 
forcefloat = lambda val : val if type(val) in (int,float) else (
         (val - datetime.datetime(1899,12,31,0,0,0)).total_seconds()/86400) 

for row in ws.iter_rows(): 
     print(row[0].internal_value ,forcefloat(row[3].internal_value) ,row[4].internal_value   ,row[5].internal_value) 


print('Done')

不完全是最優雅的解決，但它的工作原理。

來源

2013-06-06 04:38:34 Kupiakos

我認爲我們正在取得進展。感謝關於type（val）的提示，這幫助我瞭解了行迭代返回的內容。問題是它返回類型'datetime.datetime'。 strptime（val ...）似乎不像傳遞給它的datetime.datetime，但期望一個字符串。我經歷了幾次轉換（datetime.datetime - > string - > strptime，這似乎工作正常。現在我只需要兩個包裝到你的lambda。感謝提示日期時間。 – Joel

哦，這是甚至我將編輯問題以匹配，strptime部分實際上是將一個字符串轉換爲datetime.datetime，如果我們已經有了datetime.datetime，我們可以跳過字符串部分 – Kupiakos

工程就像一個魅力！ – Joel

使用openpyxl作爲timedate讀取的浮點數值

回答

相關問題