2015-09-30 48 views
1

我有以下數據框:操作上熊貓據幀列內舉行的元組

start  end   days 
0 2015-07-01 2015-07-07   (1, 2, 3, 4, 5, 6, 7) 
1 2015-07-08 2015-07-14 (8, 9, 10, 11, 12, 13, 14) 
2 2015-07-15 2015-07-21 (15, 16, 17, 18, 19, 20, 21) 
3 2015-07-22 2015-07-28 (22, 23, 24, 25, 26, 27, 28) 
4 2015-07-29 2015-08-04  (29, 30, 31, 1, 2, 3, 4) 
5 2015-08-05 2015-08-11  (5, 6, 7, 8, 9, 10, 11) 
6 2015-08-12 2015-08-18 (12, 13, 14, 15, 16, 17, 18) 
7 2015-08-19 2015-08-25 (19, 20, 21, 22, 23, 24, 25) 
8 2015-08-26 2015-09-01 (26, 27, 28, 29, 30, 31, 1) 
9 2015-09-02 2015-09-08   (2, 3, 4, 5, 6, 7, 8) 
10 2015-09-09 2015-09-15 (9, 10, 11, 12, 13, 14, 15) 
11 2015-09-16 2015-09-22 (16, 17, 18, 19, 20, 21, 22) 
12 2015-09-23 2015-09-29 (23, 24, 25, 26, 27, 28, 29) 

我有興趣與日柱的工作包含元組,使用基本的過濾熊貓語法不出現工作:

df[4 in df['days'] == True] 

我希望上述將過濾數據幀返回以下的行,即,元組包含4:

 start  end    days 
    0 2015-07-01 2015-07-07   (1, 2, 3, 4, 5, 6, 7) 
    4 2015-07-29 2015-08-04  (29, 30, 31, 1, 2, 3, 4) 
    9 2015-09-02 2015-09-08   (2, 3, 4, 5, 6, 7, 8) 

而是返回一個空的DataFrame。

我也嘗試創建一個新的列基於對像這樣的表達式檢查,以保持真/假值:

df['daysTF'] = 4 in df['days'] 

這將返回與「daysTF」列中的數據框設置爲True所有行,而不是僅當元組中包含4時才爲真。

回答

1

做到這一點的一種方法是使用Series.apply方法,雖然這可能不是很快。示例 -

df[df['days'].apply(lambda x: 4 in x)] 

演示 -

In [139]: df 
Out[139]: 
     start   end       days 
0 2015-07-01 2015-07-07   (1, 2, 3, 4, 5, 6, 7) 
1 2015-07-08 2015-07-14 (8, 9, 10, 11, 12, 13, 14) 
2 2015-07-15 2015-07-21 (15, 16, 17, 18, 19, 20, 21) 
3 2015-07-22 2015-07-28 (22, 23, 24, 25, 26, 27, 28) 
4 2015-07-29 2015-08-04  (29, 30, 31, 1, 2, 3, 4) 
5 2015-08-05 2015-08-11  (5, 6, 7, 8, 9, 10, 11) 
6 2015-08-12 2015-08-18 (12, 13, 14, 15, 16, 17, 18) 
7 2015-08-19 2015-08-25 (19, 20, 21, 22, 23, 24, 25) 
8 2015-08-26 2015-09-01 (26, 27, 28, 29, 30, 31, 1) 
9 2015-09-02 2015-09-08   (2, 3, 4, 5, 6, 7, 8) 
10 2015-09-09 2015-09-15 (9, 10, 11, 12, 13, 14, 15) 
11 2015-09-16 2015-09-22 (16, 17, 18, 19, 20, 21, 22) 
12 2015-09-23 2015-09-29 (23, 24, 25, 26, 27, 28, 29) 

In [141]: df['days'][0] 
Out[141]: (1, 2, 3, 4, 5, 6, 7) 

In [142]: type(df['days'][0]) 
Out[142]: tuple 

In [143]: df[df['days'].apply(lambda x: 4 in x)] 
Out[143]: 
     start   end      days 
0 2015-07-01 2015-07-07  (1, 2, 3, 4, 5, 6, 7) 
4 2015-07-29 2015-08-04 (29, 30, 31, 1, 2, 3, 4) 
9 2015-09-02 2015-09-08  (2, 3, 4, 5, 6, 7, 8) 
0

另一種方式做同樣的:

df[[4 in daystuple for daystuple in df[‘days’]]]