您可以groupby
區別轉移和原始列Date
之間,得到的數他們通過cumsum
,由他們計數cumcount
並轉換爲納秒。
納秒(1E-9
)是爲毫秒(1E-3
)更好,因爲使用毫秒可以創建新的口是心非行,但不具有毫微秒(原始數據採用毫秒 - 0 2015-11-02 00:00:01.072 EUR/USD 1.10294 1.10296
)。
df = df.reset_index()
#create ms column
df['Date'] = df['Date'] + (df['Date'].groupby((df['Date'] != df['Date'].shift()).cumsum())
.cumcount()).values.astype('timedelta64[ns]')
print df
Date Col1
0 2015-01-01 00:00:00.000000000 1
1 2015-01-01 00:00:01.000000000 1
2 2015-01-01 00:00:01.000000001 1
3 2015-01-01 00:00:01.000000002 1
4 2015-01-01 00:00:02.000000000 1
5 2015-01-01 00:00:04.000000000 1
6 2015-01-01 00:00:04.000000001 1
7 2015-01-01 00:00:06.000000000 1
8 2015-01-01 00:00:07.000000000 1
9 2015-01-01 00:00:07.000000001 1
#set column Date as index
df = df.set_index('Date')
最快溶液使用納秒並且如果表裏不一數據的最大長度小於作爲1000000
(1E6
)都可以使用。
因此,如果您使用csv
(3898069 rows),首先檢查這個長度,如果DF的行是爲1E6
更高:
import pandas as pd
df = pd.read_csv('test/EURUSD-2015-11.csv', header=None, parse_dates=[1],
names =['eurusd','Date','a','b'], sep=",")
#sort values if not sorted
df = df.sort_values('Date')
print df.head()
print df[df['Date'] == df['Date'].shift()]
eurusd Date a b
1996 EUR/USD 2015-11-02 00:51:18.198 1.10323 1.10327
2944 EUR/USD 2015-11-02 01:00:03.844 1.10321 1.10326
6450 EUR/USD 2015-11-02 01:37:35.898 1.10319 1.10324
11429 EUR/USD 2015-11-02 02:24:29.945 1.10301 1.10306
19468 EUR/USD 2015-11-02 03:13:40.575 1.10326 1.10333
20074 EUR/USD 2015-11-02 03:17:03.607 1.10282 1.10288
36618 EUR/USD 2015-11-02 04:36:01.357 1.10213 1.10217
40235 EUR/USD 2015-11-02 04:49:05.946 1.10075 1.10082
42930 EUR/USD 2015-11-02 05:01:37.955 1.10034 1.10042
43269 EUR/USD 2015-11-02 05:03:21.360 1.10070 1.10073
47043 EUR/USD 2015-11-02 05:22:59.811 1.10142 1.10149
47526 EUR/USD 2015-11-02 05:25:45.474 1.10143 1.10150
53398 EUR/USD 2015-11-02 05:58:23.674 1.10294 1.10299
59899 EUR/USD 2015-11-02 06:44:55.266 1.10145 1.10150
64480 EUR/USD 2015-11-02 07:30:27.091 1.10211 1.10217
70576 EUR/USD 2015-11-02 08:14:04.318 1.10329 1.10336
75662 EUR/USD 2015-11-02 08:54:35.138 1.10485 1.10486
75724 EUR/USD 2015-11-02 08:55:00.577 1.10504 1.10507
93917 EUR/USD 2015-11-02 10:55:20.863 1.10345 1.10349
94603 EUR/USD 2015-11-02 10:57:56.289 1.10352 1.10356
98046 EUR/USD 2015-11-02 11:16:24.127 1.10272 1.10278
98433 EUR/USD 2015-11-02 11:19:14.109 1.10281 1.10286
100582 EUR/USD 2015-11-02 11:31:57.891 1.10247 1.10252
105627 EUR/USD 2015-11-02 12:11:01.900 1.10243 1.10246
106789 EUR/USD 2015-11-02 12:19:45.974 1.10183 1.10190
115219 EUR/USD 2015-11-02 14:06:47.229 1.10194 1.10200
116808 EUR/USD 2015-11-02 14:35:50.693 1.10204 1.10211
124436 EUR/USD 2015-11-02 17:06:48.286 1.10125 1.10144
124532 EUR/USD 2015-11-02 17:07:56.048 1.10160 1.10174
124734 EUR/USD 2015-11-02 17:11:51.609 1.1.10142
... ... ... ... ...
3893816 EUR/USD 2015-11-30 20:59:38.304 1.05651 1.05655
3893818 EUR/USD 2015-11-30 20:59:39.341 1.05650 1.05653
3893819 EUR/USD 2015-11-30 20:59:39.976 1.05651 1.05653
3893820 EUR/USD 2015-11-30 20:59:45.170 1.05652 1.05653
3895397 EUR/USD 2015-11-30 20:59:51.605 1.05654 1.05658
3895398 EUR/USD 2015-11-30 20:59:51.707 1.05655 1.05659
3893838 EUR/USD 2015-11-30 20:59:51.767 1.05656 1.05657
3893841 EUR/USD 2015-11-30 20:59:51.816 1.05658 1.05662
3895401 EUR/USD 2015-11-30 20:59:52.073 1.05659 1.05663
3895402 EUR/USD 2015-11-30 20:59:52.229 1.05660 1.05664
3893847 EUR/USD 2015-11-30 20:59:52.818 1.05659 1.05663
3895404 EUR/USD 2015-11-30 20:59:52.915 1.05660 1.05664
3893852 EUR/USD 2015-11-30 20:59:53.106 1.05661 1.05662
3893855 EUR/USD 2015-11-30 20:59:57.031 1.05662 1.05664
3895407 EUR/USD 2015-11-30 20:59:57.084 1.05664 1.05668
3895416 EUR/USD 2015-11-30 21:00:00.816 1.05664 1.05665
3895718 EUR/USD 2015-11-30 21:05:45.605 1.05666 1.05670
3895857 EUR/USD 2015-11-30 21:12:38.965 1.05659 1.05663
3895866 EUR/USD 2015-11-30 21:12:44.505 1.05666 1.05666
3895899 EUR/USD 2015-11-30 21:13:07.805 1.05669 1.05673
3895931 EUR/USD 2015-11-30 21:13:55.007 1.05675 1.05677
3896093 EUR/USD 2015-11-30 21:25:27.988 1.05658 1.05663
3896097 EUR/USD 2015-11-30 21:25:28.002 1.05661 1.05665
3896209 EUR/USD 2015-11-30 21:28:25.906 1.05655 1.05660
3896307 EUR/USD 2015-11-30 21:32:32.490 1.05653 1.05658
3896342 EUR/USD 2015-11-30 21:35:40.005 1.05654 1.05660
3896393 EUR/USD 2015-11-30 21:40:40.182 1.05648 1.05652
3896849 EUR/USD 2015-11-30 22:19:34.582 1.05670 1.05684
3897168 EUR/USD 2015-11-30 22:40:27.108 1.05675 1.05686
3897389 EUR/USD 2015-11-30 22:50:46.825 1.05705 1.05717
[35636 rows x 4 columns]
print len(df[df['Date'] == df['Date'].shift()])
35636
所以35636
少爲1000000
然後你可以計算這個獨特的行到999999
:
df.loc[df['Date'] == df['Date'].shift(), 'Date'] =
df['Date'] +
((df['Date'] == df['Date'].shift()).cumsum()).astype('timedelta64[ns]')
print df
Date Col1
0 2015-01-01 00:00:00.000000000 1
1 2015-01-01 00:00:01.000000000 1
2 2015-01-01 00:00:01.000000001 1
3 2015-01-01 00:00:01.000000002 1
4 2015-01-01 00:00:02.000000000 1
5 2015-01-01 00:00:04.000000000 1
6 2015-01-01 00:00:04.000000003 1
7 2015-01-01 00:00:06.000000000 1
8 2015-01-01 00:00:07.000000000 1
9 2015-01-01 00:00:07.000000004 1
.
.
.
99945 2015-01-01 23:59:09.000999999 1
比較:
import pandas as pd
df = pd.read_csv('test/EURUSD-2015-11.csv', header=None, parse_dates=[1],
names =['eurusd','Date','a','b'], sep=",")
#sort values if not sorted
df = df.sort_values('Date')
print df.head()
#print df[df['Date'] == df['Date'].shift()]
#print len(df[df['Date'] == df['Date'].shift()])
df3 = df.copy()
def ori(df):
df['Date']=df['Date']+(df['Date'].groupby((df['Date'] != df['Date'].shift())
.cumsum()).cumcount()).values.astype('timedelta64[ns]')
return df
def new(df):
df.loc[df['Date'] == df['Date'].shift(), 'Date'] = df['Date'] +
((df['Date'] == df['Date'].shift()).cumsum()).astype('timedelta64[ns]')
return df
df1 = ori(df)
df2 = new(df3)
print df1.head()
print df2.head()
時機較好:
In [81]: %timeit ori(df)
1 loops, best of 3: 2min 22s per loop
Compiler time: 0.10 s
In [82]: %timeit new(df)
1 loops, best of 3: 758 ms per loop
我發現你可能需要在[官方文檔]一切(http://pandas.pydata.org/pandas- docs/stable/io.html#日期處理) –