2016-03-10 165 views
1

我有一個Twitter數據集,我嘗試使用熊貓進行分析,但我無法弄清楚如何轉換(例如「2天」,「24小時」或「2月「,」5年「)轉換爲日期時間格式。使用熊貓將字符串轉換爲日期時間值

我用下面的代碼:

for i df_merge['last_tweet']: 
    n = i['last_tweet'].split(" ") [0] 
    d = i['last_tweet'].split(" ") [1] 
if d in ["years", "year"]: 
    n_days = n*365 
elif d in ["months", "month"]: 
    n_days = n*30 

回答

2

您可能需要編寫一個輔助函數...

import numpy as np 
import pandas as pd 

def ym2nptimedelta(delta): 
    delta_cfg = { 
     'month': 'M', 
     'months': 'M', 
     'year': 'Y', 
     'years': 'Y' 
    } 
    n, item = delta.lower().split() 
    return np.timedelta64(n, delta_cfg.get(item)) 

print(pd.datetime.today() - pd.Timedelta('2 days')) 
print(pd.datetime.today() - pd.Timedelta('24 hours')) 
print(pd.to_datetime(pd.datetime.now()) - ym2nptimedelta('2 years')) 
print(pd.to_datetime(pd.datetime.now()) - ym2nptimedelta('5 years')) 

輸出:

2016-03-08 20:39:34.315969 
2016-03-09 20:39:34.315969 
2014-03-11 09:01:10.316969 
2011-03-11 15:33:34.317969 

UPDATE1(這個幫手函數將處理所有可接受的numpy時間量):

import numpy as np 
import pandas as pd 

def deltastr2date(delta): 
    delta_cfg = { 
     'year': 'Y', 
     'years': 'Y', 
     'month': 'M', 
     'months': 'M', 
     'week': 'W', 
     'weeks': 'W', 
     'day': 'D', 
     'days': 'D', 
     'hour': 'h', 
     'hours': 'h', 
     'min': 'm', 
     'minute': 'm', 
     'minutes': 'm', 
     'sec': 's', 
     'second': 's', 
     'seconds': 's', 
    } 
    n, item = delta.lower().split() 
    return pd.to_datetime(pd.datetime.now()) - np.timedelta64(n, delta_cfg.get(item)) 

print(deltastr2date('2 days')) 
print(deltastr2date('24 hours')) 
print(deltastr2date('2 years')) 
print(deltastr2date('5 years')) 
print(deltastr2date('1 week')) 
print(deltastr2date('10 hours')) 
print(deltastr2date('45 minutes')) 

OUTPUT:

2016-03-08 20:50:01.701853 
2016-03-09 20:50:01.702853 
2014-03-11 09:11:37.702853 
2011-03-11 15:44:01.703853 
2016-03-03 20:50:01.704854 
2016-03-10 10:50:01.705854 
2016-03-10 20:05:01.705854 

UPDATE2(顯示如何將輔助函數適用於DF列):

import numpy as np 
import pandas as pd 

def deltastr2date(delta): 
    delta_cfg = { 
     'year': 'Y', 
     'years': 'Y', 
     'month': 'M', 
     'months': 'M', 
     'week': 'W', 
     'weeks': 'W', 
     'day': 'D', 
     'days': 'D', 
     'hour': 'h', 
     'hours': 'h', 
     'min': 'm', 
     'minute': 'm', 
     'minutes': 'm', 
     'sec': 's', 
     'second': 's', 
     'seconds': 's', 
    } 
    n, item = delta.lower().split() 
    return pd.to_datetime(pd.datetime.now()) - np.timedelta64(n, delta_cfg.get(item)) 

N = 20 

dt_units = ['seconds','minutes','hours','days','weeks','months','years'] 

# generate random list of deltas 
deltas = ['{0[0]} {0[1]}'.format(tup) for tup in zip(np.random.randint(1,5,N), np.random.choice(dt_units, N))] 

df = pd.DataFrame({'delta': pd.Series(deltas)}) 

# add new column 
df['last_tweet_dt'] = df['delta'].apply(deltastr2date) 
print(df) 

OUTPUT:

 delta    last_tweet_dt 
0  3 hours 2016-03-10 20:32:49.252525 
1  4 days 2016-03-06 23:32:49.252525 
2 3 seconds 2016-03-10 23:32:46.253525 
3  1 weeks 2016-03-03 23:32:49.253525 
4 1 minutes 2016-03-10 23:31:49.253525 
5 2 minutes 2016-03-10 23:30:49.253525 
6  4 days 2016-03-06 23:32:49.254525 
7  1 years 2015-03-11 17:43:37.254525 
8 2 seconds 2016-03-10 23:32:47.254525 
9 3 minutes 2016-03-10 23:29:49.254525 
10 1 hours 2016-03-10 22:32:49.255525 
11 2 seconds 2016-03-10 23:32:47.255525 
12 3 minutes 2016-03-10 23:29:49.255525 
13 3 months 2015-12-10 16:05:31.255525 
14 4 weeks 2016-02-11 23:32:49.256526 
15 3 months 2015-12-10 16:05:31.256526 
16 4 hours 2016-03-10 19:32:49.256526 
17 1 years 2015-03-11 17:43:37.256526 
18 2 years 2014-03-11 11:54:25.257526 
19 1 minutes 2016-03-10 23:31:49.257526 
+0

謝謝!我對Python非常陌生,並且在將該函數應用於數據集的列時遇到問題。我嘗試這樣做的代碼:日期= df_merge [ 'last_tweet'] new_tweet =(deltastr2date(日期)) 打印(new_tweet) – Sil

+0

請後樣本的輸入數據和期望的輸出數據,並且還錯誤堆棧 – MaxU

+0

#Sample輸入
| ** ** last_tweet |
| ----------------- |
| 4天|
| NaN |
| 1天|
| 2天|
| 24小時|
| 1個月|
#Sample output
| ** last_tweet ** |
| ----------------- |
| 4 |
| NaN |
| 1 |
| 2 |
| 24 |
| 1 |
Sil

相關問題