2015-11-06 91 views
0

我有兩個CSV文件,其格式爲str,時間戳數據。 第一CSV_1已經從時間序列熊貓重複採樣數據,進15分鐘塊和看起來像:在多個日期時間組中查找日期時間實例python

time   ave_speed 
1/13/15 4:30 34.12318398 
1/13/15 4:45 0.83396195 
1/13/15 5:00 1.466816057 

CSV_2具有從GPS點例如固定的時間

id  time   lat   lng 
513620 1/13/15 4:31 -8.15949 118.26005 
513667 1/13/15 4:36 -8.15215 118.25847 
513668 1/13/15 5:01 -8.15211 118.25847 

我試圖通過這兩個文件迭代尋找到時間CSV_2在CSV_1在15分鐘的時間組內發現的,然後做一些實例。在這種情況下,將ave_speed附加到條件爲真的每個條目。

使用上面的例子所期望的結果:

id  time   lat   lng   ave_speed 
513620 1/13/15 4:31 -8.15949 118.26005  0.83396195 
513667 1/13/15 4:36 -8.15215 118.25847  0.83396195 
513668 1/13/15 5:01 -8.15211 118.25847  something else 

我只試過在大熊貓dataframes這樣做,但遇到了一些麻煩我想這可能是一個解決辦法,以實現我後我。

這是我迄今爲止編寫的代碼,我覺得它很接近,但我似乎無法指定邏輯來讓我的for循環返回15分鐘時間組中的條目。

with open('path/CSV_2.csv', mode="rU") as infile: 
with open('path/CSV_1.csv', mode="rU") as newinfile: 
    reader = csv.reader(infile) 
    nreader = csv.reader(newinfile) 
    next(nreader, None) # skip the headers 
    next(reader, None) # skip the headers 

    for row in nreader: 
     for dfrow in reader: 
      if (datetime.datetime.strptime(dfrow[2],'%Y-%m-%d %H:%M:%S') < datetime.datetime.strptime(row[0],'%Y-%m-%d %H:%M:%S') and 
      datetime.datetime.strptime(dfrow[2],'%Y-%m-%d %H:%M:%S') > datetime.datetime.strptime(row[0],'%Y-%m-%d %H:%M:%S') - datetime.timedelta(minutes=15)): 
       print dfrow[2] 

鏈接到熊貓質疑我貼有同樣的問題Pandas, check if timestamp value exists in resampled 30 min time bin of datetimeindex

編輯: 創建時間兩個列表,即listOne所有從CSV_1時代和listTwo與CSV_2所有的時候,我能在時間組中找到實例。所以使用CSV值有點奇怪。任何幫助,將不勝感激。

回答

0

我覺得這與我想要的非常接近,如果有人對如何做同樣的事情感到好奇。這不是高效的,因爲雙循環,當前腳本需要大約1天時間遍歷所有行。

如果任何人有任何想法如何使這更容易或更快我會非常感興趣。

#OPEN THE CSV FILES 
with open('/GPS_Timepoints.csv', mode="rU") as infile: 
with open('/Resampled.csv', mode="rU") as newinfile: 
    reader = csv.reader(infile) 
    nreader = csv.reader(newinfile) 
    next(nreader, None) # skip the headers 
    next(reader, None) # skip the headers 

    #DICT COMPREHENSION TO GET ONLY THE DESIRED DATA FROM CSV    
    checkDates = {row[0] : row[7] for row in nreader } 
    x = checkDates.items() 

    # READ CSV INTO LIST (SEEMED TO BE EASIER THAN READING DIRECT FROM CSV FILE, I DON'T KNOW IF IT'S FASTER) 
    csvDates = [] 
    for row in reader: 
     csvDates.append(row) 

    #LOOP 1 TO ITERATE OVER FULL RANGE OF DATES IN RESAMPLED DATA AND A PRINT STATEMENT TO GIVE ME HOPE THE PROGRAM IS RUNNING 
    for i in range(0,len(x)): 
     print 'checking', i 
     #TEST TO SEE IF THE TIME IS IN THE TIME RANGE, THEN IF TRUE INSERT THE DESIRED ATTRIBUTE, IN THIS CASE SPEED TO THE ROW 
     for row in csvDates: 
      if row[2] > x[i-1][0] and row[2] < x[i][0]: 
       row.insert(9,x[i][1]) 

    # GET THE RESULT TO CSV TO UPLOAD INTO GIS 
    with open('/result.csv', mode="w") as outfile: 

     wr = csv.writer(outfile) 
     wr.writerow(['id','boat_id','time','state','lat','lng','activity','speed', 'state_reason']) 

     for row in csvDates: 
      wr.writerow(row) 
相關問題