熊貓：如何從周和年創建日期時間對象？

我有一個數據幀提供與今年年份和星期的兩個整列：熊貓：如何從周和年創建日期時間對象？

import pandas as pd 
import numpy as np 
L1 = [43,44,51,2,5,12] 
L2 = [2016,2016,2016,2017,2017,2017] 
df = pd.DataFrame({"Week":L1,"Year":L2}) 

df 
Out[72]: 
    Week Year 
0 43 2016 
1 44 2016 
2 51 2016 
3  2 2017 
4  5 2017 
5 12 2017

我需要創建這兩個數字的日期時間對象。

我想這一點，但它拋出一個錯誤：

df["DT"] = df.apply(lambda x: np.datetime64(x.Year,'Y') + np.timedelta64(x.Week,'W'),axis=1)

然後我想這一點，它的工作原理，但給出了錯誤的結果，那就是它忽略了完全的一週：

df["S"] = df.Week.astype(str)+'-'+df.Year.astype(str) 
df["DT"] = df["S"].apply(lambda x: pd.to_datetime(x,format='%W-%Y')) 

df 
Out[74]: 
    Week Year  S   DT 
0 43 2016 43-2016 2016-01-01 
1 44 2016 44-2016 2016-01-01 
2 51 2016 51-2016 2016-01-01 
3  2 2017 2-2017 2017-01-01 
4  5 2017 5-2017 2017-01-01 
5 12 2017 12-2017 2017-01-01

我真的在Python的datetime，Numpy的datetime64和pandas Timestamp之間迷路了，你能告訴我它是如何正確完成的嗎？

我正在使用Python 3，如果這是相關的任何方式。

來源

2017-08-01 Khris

是'Week'值[ISO週數（https://en.wikipedia.org/wiki/ISO_week_date）或做他們所代表7天爲單位？ – unutbu

最初我在's'中有時間戳，它們使用'pd.to_datetime（）'轉換，然後通過使用Timestamp上的'dt.week'來提取一週。 – Khris

這裏存在一個微妙的缺陷 - 如果's'包含日期'2016-1-1'，那麼它的ISO周編號（由'dt.week'返回）是53，而它的ISO年（你沒有' t記錄）是2015年。如果您嘗試使用2016年和ISO第53周重新構建日期，那麼您將得到2017-01-02（假定星期一從本週開始）。因此，除非您還記錄ISO年（不總是與實際年相同），否則無法正確往返。 – unutbu

試試這個：

In [19]: pd.to_datetime(df.Year.astype(str), format='%Y') + \ 
      pd.to_timedelta(df.Week.mul(7).astype(str) + ' days') 
Out[19]: 
0 2016-10-28 
1 2016-11-04 
2 2016-12-23 
3 2017-01-15 
4 2017-02-05 
5 2017-03-26 
dtype: datetime64[ns]

Initially I have timestamps in s

它很容易從UNIX紀元時間戳解析它：

df['Date'] = pd.to_datetime(df['UNIX_Time'], unit='s')

定時爲10M行DF：

設置：

In [26]: df = pd.DataFrame(pd.date_range('1970-01-01', freq='1T', periods=10**7), columns=['date']) 

In [27]: df.shape 
Out[27]: (10000000, 1) 

In [28]: df['unix_ts'] = df['date'].astype(np.int64)//10**9 

In [30]: df 
Out[30]: 
         date unix_ts 
0  1970-01-01 00:00:00   0 
1  1970-01-01 00:01:00   60 
2  1970-01-01 00:02:00  120 
3  1970-01-01 00:03:00  180 
4  1970-01-01 00:04:00  240 
5  1970-01-01 00:05:00  300 
6  1970-01-01 00:06:00  360 
7  1970-01-01 00:07:00  420 
8  1970-01-01 00:08:00  480 
9  1970-01-01 00:09:00  540 
...      ...  ... 
9999990 1989-01-05 10:30:00 599999400 
9999991 1989-01-05 10:31:00 599999460 
9999992 1989-01-05 10:32:00 599999520 
9999993 1989-01-05 10:33:00 599999580 
9999994 1989-01-05 10:34:00 599999640 
9999995 1989-01-05 10:35:00 599999700 
9999996 1989-01-05 10:36:00 599999760 
9999997 1989-01-05 10:37:00 599999820 
9999998 1989-01-05 10:38:00 599999880 
9999999 1989-01-05 10:39:00 599999940 

[10000000 rows x 2 columns]

檢查：

In [31]: pd.to_datetime(df.unix_ts, unit='s') 
Out[31]: 
0   1970-01-01 00:00:00 
1   1970-01-01 00:01:00 
2   1970-01-01 00:02:00 
3   1970-01-01 00:03:00 
4   1970-01-01 00:04:00 
5   1970-01-01 00:05:00 
6   1970-01-01 00:06:00 
7   1970-01-01 00:07:00 
8   1970-01-01 00:08:00 
9   1970-01-01 00:09:00 
        ... 
9999990 1989-01-05 10:30:00 
9999991 1989-01-05 10:31:00 
9999992 1989-01-05 10:32:00 
9999993 1989-01-05 10:33:00 
9999994 1989-01-05 10:34:00 
9999995 1989-01-05 10:35:00 
9999996 1989-01-05 10:36:00 
9999997 1989-01-05 10:37:00 
9999998 1989-01-05 10:38:00 
9999999 1989-01-05 10:39:00 
Name: unix_ts, Length: 10000000, dtype: datetime64[ns]

時間：

In [32]: %timeit pd.to_datetime(df.unix_ts, unit='s') 
10 loops, best of 3: 156 ms per loop

結論：我認爲156毫秒轉換10.000.000行是不是慢

來源

2017-08-01 11:50:32 MaxU

也許直接使用時間戳確實是一個更好的主意。然而，我正在處理數千萬行代碼，'datetime'這個東西的速度非常慢。 – Khris

@Khris，是的，使用這種方法我們可以精確地將其轉換爲 – MaxU

@Khris，我已經添加了時間 - 請檢查 – MaxU

需要%w用於指定哪一天是第一週：

df["DT"] = pd.to_datetime(df.Week.astype(str)+ 
          df.Year.astype(str).add('-0') ,format='%W%Y-%w') 
print (df) 

    Week Year   DT 
0 43 2016 2016-10-30 
1 44 2016 2016-11-06 
2 51 2016 2016-12-25 
3  2 2017 2017-01-15 
4  5 2017 2017-02-05 
5 12 2017 2017-03-26

df["DT"] = pd.to_datetime(df.Week.astype(str)+ 
          df.Year.astype(str).add('-1') ,format='%W%Y-%w') 
print (df) 
    Week Year   DT 
0 43 2016 2016-10-24 
1 44 2016 2016-10-31 
2 51 2016 2016-12-19 
3  2 2017 2017-01-09 
4  5 2017 2017-01-30 
5 12 2017 2017-03-20 

df["DT"] = pd.to_datetime(df.Week.astype(str)+ 
          df.Year.astype(str).add('-2') ,format='%W%Y-%w') 
print (df) 

    Week Year   DT 
0 43 2016 2016-10-25 
1 44 2016 2016-11-01 
2 51 2016 2016-12-20 
3  2 2017 2017-01-10 
4  5 2017 2017-01-31 
5 12 2017 2017-03-21

來源

2017-08-01 11:52:13 jezrael

奇怪的是，文檔暗示第一天已經使用'％W'或'％U'定義：https：//docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior – Khris

嗯，我找到解釋[這裏]（https://stackoverflow.com/a/17087427/2901002） – jezrael

熊貓：如何從周和年創建日期時間對象？

回答

相關問題