2017-03-03 186 views
2
對其計數後排序日期

我提取從字符串的日期和時間,並將其轉換爲熊貓DatFrame,通過wrintig:的Python:在熊貓

df = pd.to_datetime(news_date, format='%m/%d/%Y') 

並且輸出是這樣的:

['1997-10-31 18:00:00', '1997-10-31 18:00:00', 
      '1997-10-31 18:00:00', '1997-10-31 18:00:00', 
      '1997-10-31 18:00:00', '1997-10-31 18:00:00', 
      '1997-10-31 18:00:00', '1997-10-31 18:00:00', 
      '1997-10-31 18:00:00', '1997-10-31 18:00:00', 
      ... 
      '2016-12-07 03:14:00', '2016-12-09 16:31:00', 
      '2016-12-10 19:02:00', '2016-12-11 09:41:00', 
      '2016-12-12 05:01:00', '2016-12-12 05:39:00', 
      '2016-12-12 06:44:00', '2016-12-12 08:11:00', 
      '2016-12-12 09:36:00', '2016-12-12 10:19:00'] 

然後我想保持唯一的年份和月份和排序的日期,我寫道:

month_year = df.to_series().apply(lambda x: dt.datetime.strftime(x, '%m-%Y')).tolist() # remove time and day 
new = sorted(month_year, key=lambda x: datetime.datetime.strptime(x, '%m-%Y')) # sort date 

到目前爲止,我已經人日期。當我嘗試計算它們的頻率時(我必須稍後繪製時間分佈圖),就會出現問題。 我的代碼是:

print(pd.DataFrame(new).groupby(month_year).count()) 

,輸出是:

01-1998 60 
01-1999 18 
01-2000 49 
01-2001 50 
01-2002 87 
01-2003 129 
01-2004 125 
01-2005 225 
01-2006 154 
01-2007 302 
01-2008 161 
01-2009 161 
01-2010 167 
01-2011 181 
01-2012 227 
...  ... 
12-2014 82 
12-2015 89 
12-2016 13 

不過,我想有一列排序的日期,以及其在另一列頻率(例如,熊貓數據幀)通過轉換to_period然後value_counts,爲sortin

01-1998 60 
02-1998 32 
03-1998 22 
...  ... 
11-2016 20 
12-2016 13 

回答

2

我想你需要month period:可以很容易地繪製,像克使用sort_index

news_date = ['1997-10-31 18:00:00', '1997-10-31 18:00:00', 
      '1997-10-30 18:00:00', '1997-10-30 18:00:00', 
      '1997-10-30 18:00:00', '1997-10-30 18:00:00', 
      '1997-11-30 18:00:00', '1997-11-30 18:00:00', 
      '1997-12-30 18:00:00', '1997-12-30 18:00:00', 
      '2016-12-07 03:14:00', '2016-01-09 16:31:00', 
      '2016-12-10 19:02:00', '2016-01-11 09:41:00', 
      '2016-12-12 05:01:00', '2016-02-12 05:39:00', 
      '2016-12-12 06:44:00', '2016-12-12 08:11:00', 
      '2016-12-12 09:36:00', '2016-12-12 10:19:00'] 

idx = pd.to_datetime(news_date) 
new = pd.Series(idx.to_period('m')) 
print (new) 
0 1997-10 
1 1997-10 
2 1997-10 
3 1997-10 
4 1997-10 
5 1997-10 
6 1997-11 
7 1997-11 
8 1997-12 
9 1997-12 
10 2016-12 
11 2016-01 
12 2016-12 
13 2016-01 
14 2016-12 
15 2016-02 
16 2016-12 
17 2016-12 
18 2016-12 
19 2016-12 
dtype: object 
df = new.value_counts().sort_index().reset_index() 
df.columns = ['Date','Count'] 
df.Date = df.Date.dt.strftime('%Y-%m') 
print (df) 
     Date Count 
0 1997-10  6 
1 1997-11  2 
2 1997-12  2 
3 2016-01  2 
4 2016-02  1 
5 2016-12  7 

另一種可能的解決方案是轉換爲strings首先通過strftime

new = pd.Series(idx.strftime('%Y-%m')) 
df = new.value_counts().sort_index().reset_index() 
df.columns = ['Date','Count'] 
print (df) 
    Date Count 
0 1997-10  6 
1 1997-11  2 
2 1997-12  2 
3 2016-01  2 
4 2016-02  1 
5 2016-12  7