2013-04-26 257 views
2

從以下的數據幀,其中2個獨特DBKEY S中的相同STATION的,我需要創建一個新的使用2個獨立的VAL列(VAL1VAL2)是相同的STATION加入。熊貓數據幀進行分組

 
    DBKEY STATION   DAILY_DATE  VAL 
0 T9947 G377C_C 2011-10-01 00:00:00 17.123 
1 T9947 G377C_C 2011-10-02 00:00:00  NaN 
2 T9947 G377C_C 2011-10-03 00:00:00  NaN 
3 T9947 G377C_C 2011-10-04 00:00:00  NaN 
4 T9947 G377C_C 2011-10-05 00:00:00  NaN 
5 T9947 G377C_C 2011-10-06 00:00:00  NaN 
6 T9947 G377C_C 2011-10-07 00:00:00  NaN  
7 T9947 G377C_C 2011-10-08 00:00:00  NaN  
8 T9947 G377C_C 2011-10-09 00:00:00 92.734 
9 T9947 G377C_C 2011-10-10 00:00:00 48.975 
10 T9947 G377C_C 2011-10-11 00:00:00 17.463 
11 T9947 G377C_C 2011-10-12 00:00:00  NaN 
12 T9947 G377C_C 2011-10-13 00:00:00  NaN 
13 T9947 G377C_C 2011-10-14 00:00:00 12.870 
14 T9947 G377C_C 2011-10-15 00:00:00  NaN  
15 T9947 G377C_C 2011-10-16 00:00:00 48.138 
16 T9947 G377C_C 2011-10-17 00:00:00 0.413 
17 T9947 G377C_C 2011-10-18 00:00:00 39.058 
18 T9947 G377C_C 2011-10-19 00:00:00 235.617 
19 T9947 G377C_C 2011-10-20 00:00:00 182.989 
20 T9947 G377C_C 2011-10-21 00:00:00 132.193 
21 T9947 G377C_C 2011-10-22 00:00:00 19.557 
22 T9947 G377C_C 2011-10-23 00:00:00  NaN 
23 T9947 G377C_C 2011-10-24 00:00:00 80.552 
24 T9947 G377C_C 2011-10-25 00:00:00  NaN 
25 T9947 G377C_C 2011-10-26 00:00:00  NaN 
26 T9947 G377C_C 2011-10-27 00:00:00 39.258 
27 T9947 G377C_C 2011-10-28 00:00:00  NaN  
28 T9947 G377C_C 2011-10-29 00:00:00 253.969 
29 T9947 G377C_C 2011-10-30 00:00:00 319.685 
30 T9947 G377C_C 2011-10-31 00:00:00 303.855 
31 W3972 G377C_C 2011-10-01 00:00:00 17.120 
32 W3972 G377C_C 2011-10-02 00:00:00  NaN  
33 W3972 G377C_C 2011-10-03 00:00:00  NaN 
34 W3972 G377C_C 2011-10-04 00:00:00  NaN  
35 W3972 G377C_C 2011-10-05 00:00:00  NaN  
36 W3972 G377C_C 2011-10-06 00:00:00  NaN  
37 W3972 G377C_C 2011-10-07 00:00:00  NaN  
38 W3972 G377C_C 2011-10-08 00:00:00  NaN  
39 W3972 G377C_C 2011-10-09 00:00:00 92.730 
40 W3972 G377C_C 2011-10-10 00:00:00 48.980 
41 W3972 G377C_C 2011-10-11 00:00:00 17.460 
42 W3972 G377C_C 2011-10-12 00:00:00  NaN  
43 W3972 G377C_C 2011-10-13 00:00:00  NaN  
44 W3972 G377C_C 2011-10-14 00:00:00 12.870 
45 W3972 G377C_C 2011-10-15 00:00:00  NaN  
46 W3972 G377C_C 2011-10-16 00:00:00 48.140 
47 W3972 G377C_C 2011-10-17 00:00:00 0.410 
48 W3972 G377C_C 2011-10-18 00:00:00 39.060 
49 W3972 G377C_C 2011-10-19 00:00:00 235.620 
50 W3972 G377C_C 2011-10-20 00:00:00 182.990 
51 W3972 G377C_C 2011-10-21 00:00:00 132.190 
52 W3972 G377C_C 2011-10-22 00:00:00 19.560 
53 W3972 G377C_C 2011-10-23 00:00:00  NaN 
54 W3972 G377C_C 2011-10-24 00:00:00 80.550 
55 W3972 G377C_C 2011-10-25 00:00:00  NaN 
56 W3972 G377C_C 2011-10-26 00:00:00  NaN  
57 W3972 G377C_C 2011-10-27 00:00:00 39.260 
58 W3972 G377C_C 2011-10-28 00:00:00  NaN  
59 W3972 G377C_C 2011-10-29 00:00:00 253.970 
60 W3972 G377C_C 2011-10-30 00:00:00 319.690 
61 W3972 G377C_C 2011-10-31 00:00:00 303.860 

所以,我需要的結果,只有31排,STATIONVAL1(第一組DBKEY S)和VAL2(第二組的DBKEY多個)。

STATION  DAILY_DATE VAL1  VAL2 
G377C_C  10/1/2011 17.123 17.12 
G377C_C  10/2/2011 NaN NaN 
G377C_C  10/3/2011 NaN NaN 
G377C_C  10/4/2011 NaN NaN 
G377C_C  10/5/2011 NaN NaN 
G377C_C  10/6/2011 NaN NaN 
G377C_C  10/7/2011 NaN NaN 
G377C_C  10/8/2011 NaN NaN 
G377C_C  10/9/2011 92.734 92.73 
G377C_C  10/10/2011 48.975 48.98 
G377C_C  10/11/2011 17.463 17.46 
G377C_C  10/12/2011 NaN NaN 
G377C_C  10/13/2011 NaN NaN 
G377C_C  10/14/2011 12.87  12.87 
G377C_C  10/15/2011 NaN NaN 
G377C_C  10/16/2011 48.138 48.14 
G377C_C  10/17/2011 0.413  0.41 
G377C_C  10/18/2011 39.058 39.06 
G377C_C  10/19/2011 235.617 235.62 
G377C_C  10/20/2011 182.989 182.99 
G377C_C  10/21/2011 132.193 132.19 
G377C_C  10/22/2011 19.557 19.56 
G377C_C  10/23/2011 NaN NaN 
G377C_C  10/24/2011 80.552 80.55 
G377C_C  10/25/2011 NaN NaN 
G377C_C  10/26/2011 NaN NaN 
G377C_C  10/27/2011 39.258 39.26 
G377C_C  10/28/2011 NaN NaN 
G377C_C  10/29/2011 253.969 253.97 
G377C_C  10/30/2011 319.685 319.69 
G377C_C  10/31/2011 303.855 303.86 

回答

2

如果我理解正確,這看起來很簡單。 unstack()應該照顧它:

In [2]: df = DataFrame({"DBKEY":['T9947', 'T9947', 'T9947', 'W3972','W3972','W3972'],"STATION":['G377C_C','G377C_C','G377C_C','G377C_C','G377C_C','G377C_C'],"DAILY_DATE":['2011-10-01 00:00:00','2011-10-02 00:00:00','2011-10-03 00:00:00','2011-10-01 00:00:00','2011-10-02 00:00:00','2011-10-03 00:00:00'],"VAL":[ 17.123, 'NaN', 'NaN', '17.120', 'NaN', 'NaN']}) 
In [3]: df 
Out[3]: 
      DAILY_DATE DBKEY STATION  VAL 
0 2011-10-01 00:00:00 T9947 G377C_C 17.123 
1 2011-10-02 00:00:00 T9947 G377C_C  NaN 
2 2011-10-03 00:00:00 T9947 G377C_C  NaN 
3 2011-10-01 00:00:00 W3972 G377C_C 17.120 
4 2011-10-02 00:00:00 W3972 G377C_C  NaN 
5 2011-10-03 00:00:00 W3972 G377C_C  NaN 

In [4]: df2 = df.set_index(["STATION", "DBKEY", "DAILY_DATE"]) 
In [5]: df2 
Out[5]: 
             VAL 
STATION DBKEY DAILY_DATE     
G377C_C T9947 2011-10-01 00:00:00 17.123 
       2011-10-02 00:00:00  NaN 
       2011-10-03 00:00:00  NaN 
     W3972 2011-10-01 00:00:00 17.120 
       2011-10-02 00:00:00  NaN 
       2011-10-03 00:00:00  NaN 

In [6]: df3 = df2.unstack(level=1) 
In [7]: df3 
Out[7]: 
           VAL   
DBKEY       T9947 W3972 
STATION DAILY_DATE       
G377C_C 2011-10-01 00:00:00 17.123 17.120 
     2011-10-02 00:00:00  NaN  NaN 
     2011-10-03 00:00:00  NaN  NaN 
+0

是的。這正是我需要的。但是,如何訪問結果?我需要的結果是一個新的DF,填充STATIONS,DAILY_DATE,VAL1,VAL2。我無法將結果作爲數據框傳遞給額外的處理。再次感謝---你的幫助非常巨大! – user2309282 2013-04-27 13:31:52