從列表中獲取時間戳的Python錯誤

我正在編寫一個將大型CSV文件分塊爲更小的分塊文件的腳本。它交叉引用一個日誌文件，該日誌文件包含最後一個被分塊的時間戳，以便只有晚於記錄時間的時間戳被寫入/分塊。從列表中獲取時間戳的Python錯誤

csv文件的第一列有一個時間戳，其格式爲%Y%m%d %H%M%S。 CSV文件還有四行標題信息，我不希望/不需要在我的腳本中使用rows in ts_pre子句刪除的信息。

log_lookup()函數只是從日誌中提取我正在查看的特定工作站的CSV文件的最後一個時間序列。顯然，我正在與六個不同的工作站進行合作，這些工作站都有不同的信息欄，除非它們與第二段中描述的結構相同。

局部腳本是：

import csv, sys, datetime 

def log_lookup(): 
    global STN_num 
    global STN_date 
    with open('/home/log.txt', 'rb') as open_log:  
     log_file = csv.reader(open_log) 
      for row in log_file: 
       for item in row: 
        STN_date.append(item) 
      if find == 'STN_1': 
       return STN_date[1] 
      if find == 'STN_2':   
       return STN_date[2] 
      if find == 'STN_3': 
       return STN_date[3] 
      if find == 'STN_4': 
       return STN_date[4] 
      if find == 'STN_5': 
       return STN_date[5] 
      if find == 'STN_6': 
       return STN_date[6] 

def get_ts(line): 
    print line[0:19] 
    return datetime.datetime.strptime(line, "%Y/%m/%d %H:%M:%S") 

def main(): 
    log = str(log_lookup()) #useful for knowing when to start chunking 
    log_datetime = datetime.datetime.strptime(log, "%Y/%m/%d %H:%M:%S")  

    with open(sys.argv[1], 'rb') as open_file: 
     ts_from_file = csv.reader(open_file) 
     for genrows in ts_from_file:    
      ts_pre.append(genrows) 
      for rows in ts_pre:    
       if rownum < 4: 
        ts_pre.pop() 
        rownum += 1 
       else:  
        for line in rows:        
         if get_ts(line) > log_datetime:     
          timeseries.append(line)

日誌文件很簡單：

0 
2011/10/06 18:40:00 
2012/06/27 13:25:00 
1900/01/01 00:00:00 
2011/08/03 14:55:00 
2012/06/27 20:05:00 
2011/10/03 19:25:00

... 0作爲佔位符。 （是明顯的，我不是一個程序員？）

一個例子CSV文件看起來像：

"2011/10/03 16:40:00",0,0 
"2011/10/03 16:45:00",1,0 
"2011/10/03 16:50:00",2,0 
"2011/10/03 16:55:00",3,0

的錯誤我得到當ts_line(line)功能是它的說法line[0:19]是：

2011/10/03 16:40:00 
0

，函數返回0和Python拋出這個錯誤：

ValueError: time data '0' does not match format '%Y/%m/%d %H:%M:%S'

我已驗證所返回的0是CSV文件中的第二項，但我很困惑爲什麼Python在我的切片選擇中完全返回它。有人可以向我解釋爲什麼它會返回該值，以及我需要做什麼來獲取時間戳以與日誌時間戳進行比較？

爲了獲得額外的榮譽，任何關於編碼/風格的建議總是被讚賞和/或建議更好的方式來完成我正在做的事情。我看到的CSV文件相當大（〜8 MB）所以效率越高越好。

來源

2012-08-15 qmoog

看起來您正在爲csv的每一行中的每列調用get_ts。

而不是

for line in rows:        
    if get_ts(line) > log_datetime:     
     timeseries.append(line)

嘗試：

if get_ts(line[0]) > log_datetime:     
    timeseries.append(line)

來源

2012-08-16 00:01:51 cmh

謝謝，在我身上得到了正確的方向。我結束了使用 'if get_ts（rows [0:19]）> log_datetime： timeseries.append（rows）' 而且它讀得恰到好處。現在我在閱讀午夜時間戳的時候遇到了問題，只是日期'（2011/04/06）'而不是'（2011/04/06 00:00:00）'，當我嘗試比較時會拋出錯誤。 – qmoog 2012-08-16 16:57:40

啊，關於我午夜的時間戳，看起來在從Windows中的Excel轉換爲GNU/Linux中的gedit或gnumeric時，午夜時間戳被截斷。編輯回幾個，它的工作原理應該如此。應該工作正常，因爲我最終在Windows機器上使用它。 – qmoog 2012-08-16 17:10:48

從列表中獲取時間戳的Python錯誤

回答

相關問題