2017-02-27 137 views
3

我試圖將具有三列(日期,開始,結束)的熊貓數據幀轉換爲頻率矩陣。我的輸入數據幀是這樣的:將熊貓數據幀轉換爲頻率矩陣

Date,    Start, End 
2016-09-02 09:16:00 18  16 
2016-09-02 16:14:10 16  1 
2016-09-02 06:17:21 18  17 
2016-09-02 05:51:07 23  17 
2016-09-02 18:34:44 18  17 
2016-09-02 05:44:44 20  4 
2016-09-02 09:25:22 18  17 
2016-09-02 22:27:44 18  17 
2016-09-02 16:02:46 0  18 
2016-09-02 15:35:07 17  17 
2016-09-02 16:06:42 8  17 
2016-09-02 14:47:04 16  23 
2016-09-02 07:47:24 20  1 
... 

「開始」和「結束」的值是023之間的整數。 '日期'是一個日期時間。我試圖創建的頻率矩陣是24乘24 csv,其中行i和列j是'End'= i和'Start'= j發生在輸入中的次數。例如,上述數據將創建:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15,16,17,18,19,20,21,22,23 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0 
2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0 
5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
14, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0 
17, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 4, 0, 0, 0, 0, 1 
18, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
20, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
22, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0 

額外的幫助,可這在創造了每15分鐘一個單獨的矩陣的方式來完成?這將是672個矩陣,因爲這個日期範圍是一週。 我是一個自學成才的初學者,我真的無法想象如何以pythonic的方式解決這個問題,任何解決方案或建議將不勝感激。

回答

5

用一個簡單的計數創建矩陣,拆散一列中的一種:

mat = df.groupby(['Start', 'End']).count().unstack(level=0) 

清理日期級別:

mat.columns = mat.columns.droplevel(0) 

現在重新索引的行和列,並澆鑄成整數:

mat.reindex(*[range(0,24)]*2).fillna(0) 

詳細解釋

首先,你計算一個給定(開始,結束)夫婦出現的次數。 groupby針對這兩列的結果實際上帶來了一個多重索引。

df.groupby(['Start', 'End']).count() 
Out[134]: 
      Date 
Start End  
0  18  1 
8  17  1 
16 1  1 
     23  1 
17 17  1 
18 16  1 
     17  4 
20 1  1 
     4  1 
23 17  1 

我們希望從結果中得到列索引。拆散執行此:

df.groupby(['Start', 'End']).count().unstack(level=0) 
Out[135]: 
     Date        
Start 0 8 16 17 18 20 23 
End          
1  NaN NaN 1.0 NaN NaN 1.0 NaN 
4  NaN NaN NaN NaN NaN 1.0 NaN 
16  NaN NaN NaN NaN 1.0 NaN NaN 
17  NaN 1.0 NaN 1.0 4.0 NaN 1.0 
18  1.0 NaN NaN NaN NaN NaN NaN 
23  NaN NaN 1.0 NaN NaN NaN NaN 

的拆散的結果是被移動作爲關於當前日期的列索引的頂部上的附加的列索引水平開始柱(見下文)。這就是爲什麼我們之後放下0級的原因。另一種方法 - 取決於你當前的源代碼 - 可能是預先過濾出日期列,然後拆散會帶來一個級別。

_.columns 
Out[136]: 
MultiIndex(levels=[['Date'], [0, 8, 16, 17, 18, 20, 23]], 
      labels=[[0, 0, 0, 0, 0, 0, 0], [0, 1, 2, 3, 4, 5, 6]], 
      names=[None, 'Start']) 
+0

使用'reindex'的好方案! – pansen

+0

謝謝!它的作品,但我有點失落至於如何。你能解釋一下斯達克的作用嗎? –

+1

Unstack轉換表格,將列轉成一行。 – postoronnim