2017-08-16 59 views
1

例如:Python的CSV:搶在時間條件的行的所有值的值與條件CSV數據,我試圖讓

c1,c2,v1,v2,p1,p2,r1,a1,f1,f2,f3,Time_Stamp 

0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:00 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:01 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:02 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:03 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:04 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:05 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:06 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:07 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:08 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:09 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:10 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:11 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:12 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:13 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:14 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:15 
415.7,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,13/06/2017 16:38:16 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:17 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:18 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:19 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:20 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:21 

代碼讀取CSV:

import plotly 
import plotly.plotly as py 
import plotly.graph_objs as go 
import plotly.figure_factory as FF 
import numpy as np 
from datetime import date,time,datetime 
import pandas as pd 
%matplotlib inline 
import matplotlib.pyplot as plt 

def readcsv(x): #def function to read csv files based on code below* 
    Data = pd.read_csv(x, parse_dates=['Time_Stamp'], infer_datetime_format=True) 
    Data['Date'] = Data.Time_Stamp.dt.date #date column in DataFrame 
    Data['Time'] = Data.Time_Stamp.dt.time #time column in DataFrame 

    Data['Time_Stamp'] = pd.to_datetime(Data['Time_Stamp']) 
    print(Data[1:6]) 
    return Data 

Data = readcsv('datafile.csv')#* 

def getMask(start,end,Data): 
    mask = (Data['Time_Stamp'] > start) & (Data['Time_Stamp'] <= end) 
    return mask; 

start = '2017-06-13 16:00:00' 
end = '2017-06-13 16:40:00' 
timerange = Data.loc[getMask(start, end, Data)] #* <---- using this Dataframe 
#timeR.plot(x='Time_Stamp', y='AC_Input_Current', style='-', color='black') 

我想要得到的是:

[用於例如]執行pspike(代碼如下),我會得到下面的輸出後,

13/06/2017 16:38:00 
13/06/2017 16:38:01 
13/06/2017 16:38:02 
13/06/2017 16:38:03 
13/06/2017 16:38:04 
13/06/2017 16:38:05 
13/06/2017 16:38:06 
13/06/2017 16:38:07 
13/06/2017 16:38:08 
13/06/2017 16:38:09 
13/06/2017 16:38:10 
13/06/2017 16:38:11 
13/06/2017 16:38:12 
13/06/2017 16:38:13 
13/06/2017 16:38:14 
13/06/2017 16:38:15 
13/06/2017 16:38:17 
13/06/2017 16:38:18 
13/06/2017 16:38:19 
13/06/2017 16:38:20 
13/06/2017 16:38:21 

*請注意,我用它具有Time值每秒從16:00:0016:40:00數據框timerange,得到pspike在那裏它行的跳躍,如果c1值< 5.0

[從輸出print(pspike)] 條件:如果打印行,其中Time值爲16:38:15,並且以下行的Time值爲16:38:17(其中,下一行的Time值跳過1秒)... 打印跳過的行(在這種情況下,它是在Time16:38:16

pspike = (timerange.loc[timerange['AC_Input_Current'] <= 5.0])  
print(pspike) 

with open('welding_data_by_selRange.csv','a', newline='') as duraweld: 
    a = csv.writer(duraweld) 
    data = [countIC2 ,countIC, Datetime] 
    a.writerow(data) 
+0

是否必須是熊貓解決方案?如果在連續行之間跳過*超過一秒,會怎麼樣? – wwii

+0

如果可能,熊貓,但我不介意看到另一種方法來解決它。至於如果跳過了1秒以上,沒關係,只需跳過1秒即可。 –

回答

1

更新時間:

下面的代碼將打印缺少時間戳,不管有多少時間標記缺失,所以它比以前的解決方案更強大。

for i in range(df.shape[0] - 1): 
    row1 = df.iloc[i] 
    row2 = df.iloc[i+1] 
    skipped_ts = (row2[-1] - row1[-1]).seconds 
    if skipped_ts > 1: 
    for ts in range(1,skipped_ts): 
     print (row1[-1] + pd.Timedelta(ts * '1s')) 
+0

哦哇,它的工作!我一開始並不確定df.shape的功能,所以我查了一下,然後嘗試了你的例子,謝謝! –

+0

'df.shape'是迭代/修改或爲整個df做其他事情時最有用的東西之一。 –

+0

是否有可能根據條件獲取被跳過的行? 例如_row1.values_:'2017-06-13 16:00:04'和_row2.values_:'2017-06-13 16:00:00',我可以得到它所在的行,其時間戳值爲'2017- 06-13 16:00:05' –

0

非熊貓溶液

從每一行中,提取時間戳和其他信息;使用格式字符串將時間戳記轉換爲datetime.datetime對象;從上一個時間戳中減去當前時間戳;測試經過的時間和過程(如果適用)。

import datetime, io 

#setup 
s = '''0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:12 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:13 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:15 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:16 
''' 
#data is a file-like object 
data = io.StringIO(s) 

fmt = '%d/%m/%Y %H:%M:%S' 
previous = None 

for row in data: 
    *info, timestamp = row.strip().split(',') 
    timestamp = datetime.datetime.strptime(timestamp, fmt) 
    try: 
     dt = timestamp-previous[0] 
    except TypeError as e: 
     previous = (timestamp, info) 
     continue 
    if dt.seconds > 1: 
     print('!!!\tprevious:{}\n\tcurrent:{}'.format(previous,(timestamp, info))) 
    previous = (timestamp, info) 

它可以適用於csv.reader。


時間戳最初是通過分割該行的最後一列而獲得的。然後它是製成一個datetime.datetime對象,因此可以很容易地計算時間差異。

對於磁盤文件,打開它,並遍歷它...

with open(filepath) as data: 
    for row in data: 
     *info, timestamp = row.strip().split(',') 
     timestamp = datetime.datetime.strptime(timestamp, fmt) 
     .... 

使用CSV讀者:

import csv 
with open(filepath) as data: 
    rows = csv.reader(data) 
    for row in rows: 
     *info, timestamp = row 
     timestamp = datetime.datetime.strptime(timestamp, fmt) 
     .... 

如果你可以看到整個文件轉換成數據幀,你應該能夠將其讀取爲變量

with open(filepath) as f: 
    data = f.read() 

for row in data: 
    *info, timestamp = row.strip().split(',') 
    timestamp = datetime.datetime.strptime(timestamp, fmt) 
    .... 
+0

在我繼續嘗試之前,是基於列名稱還是它的變量/函數的'timestamp'? –

+0

我試過你提供的代碼,但是我遇到了這個錯誤:_TypeError:initial_value必須是str或None,而不是DataFrame_。我相信我不能把它放到字符串中,因爲我想要使用的數據有大量的數據。 –

+0

@SanctaIgnis - 請參閱編輯 – wwii