2013-07-12 32 views
0

在Stata,我要有效地從分配的[[bid_price]最大值與各條件的其他觀察最小值[[其他觀測]這些條件:的Stata - 有效地分配最大/從

  1. [[bid_time]] < [時間] <分鐘([[bid_timelimit]][[bid_timecanceled]]
  2. [[股票]] = [股票]
  3. [[bid_price]]設置

在上文中,[[]]表示可從其它變量觀察和[]表示可從該觀察變量)

這是我的代碼:

gen maxbidprice=. 

su no 

forvalues i = `r(min)'/`r(max)'{ 
    disp `i' 
    gen double current = time[`i'] 
    egen bidtag=tag(bid_price) if stock==stock[`i'] & bid_price!=. & current>bid_time & current<bid_timelimit & current<=bid_timecanceled 
    quietly su bid_price if bidtag 
    replace maxbidprice = r(max) if no==`i' 
    drop bidtag current 
} 

我覺得我的代碼效率很低。數據集的大小超過30k,此代碼的運行時間爲幾個小時。它似乎工作,但我認爲應該有更高效的代碼。

我不應該摧毀原始表格,只需添加一個變量maxbidprice符合某些標準的所有意見。重點是從其他與某些條件匹配的觀察值中分配一個值。

任何人都可以提出一個替代方案嗎?

的樣本數據:

no,time,price,quantity,seller_pid,buyer_pid,bid_no,bid_price,bid_quantity,bid_time,bid_timelimit,bid_timecanceled,bid_pid,pid,action,stock 
300,31oct2012 13:42:03,10000,10,1919,1545,,,,,,,,1919,3,3 
301,31oct2012 13:42:03,10000,30,1919,454,,,,,,,,1919,3,3 
302,31oct2012 13:42:05,1000,10,,,152,1000,10,31oct2012 13:42:05,04nov2012 00:00:00,31oct2012 13:48:27,2450,2450,1,1 
303,31oct2012 13:42:06,10000,10,1919,1545,,,,,,,,1919,3,3 
304,31oct2012 13:42:06,10000,20,1919,1252,,,,,,,,1919,3,3 
305,31oct2012 13:42:08,10000,18,1919,1648,,,,,,,,1919,3,3 
306,31oct2012 13:42:15,10000,4,1919,2151,,,,,,,,2151,4,1 
307,31oct2012 13:42:15,10000,10,2450,2151,,,,,,,,2151,4,1 
308,31oct2012 13:42:23,6500,15,1919,655,,,,,,,,1919,3,1 
309,31oct2012 13:43:58,6000,10,1919,1127,,,,,,,,1919,3,1 
310,31oct2012 13:44:15,5000,82,1919,1842,,,,,,,,1919,3,1 
311,31oct2012 13:44:41,5000,10,,,153,5000,10,31oct2012 13:44:41,04nov2012 00:00:00,31oct2012 23:36:58,2450,2450,1,1 
312,31oct2012 13:46:21,5000,100,,,154,5000,100,31oct2012 13:46:21,16nov2012 00:00:00,01nov2012 00:18:04,1919,1919,1,1 
313,31oct2012 13:46:25,5000,3,733,1842,,,,,,,,733,3,1 
314,31oct2012 13:46:28,5000,20,,,155,5000,20,31oct2012 13:46:28,02nov2012 00:00:00,31oct2012 14:14:54,1721,1721,1,1 
315,31oct2012 13:46:54,7000,10,,,156,7000,10,31oct2012 13:46:54,06nov2012 00:00:00,31oct2012 20:36:08,209,209,1,3 
316,31oct2012 13:48:11,9700,10,,,,,,,,,,1373,2,2 
317,31oct2012 13:48:14,6000,10,,,157,6000,10,31oct2012 13:48:14,06nov2012 00:00:00,31oct2012 13:55:07,209,209,1,1 
318,31oct2012 13:48:55,10000,10,,,,,,,,,,1373,2,3 
319,31oct2012 13:49:53,10000,30,,,,,,,,,,1919,2,1 
320,31oct2012 13:50:24,9000,50,,,158,9000,50,31oct2012 13:50:24,04nov2012 00:00:00,31oct2012 17:15:46,1919,1919,1,2 
321,31oct2012 13:50:29,10000,10,1919,1725,,,,,,,,1725,4,1 
322,31oct2012 13:50:42,9000,40,,,159,9000,40,31oct2012 13:50:42,04nov2012 00:00:00,31oct2012 17:15:48,1919,1919,1,3 
323,31oct2012 13:51:10,6000,10,,,160,6000,10,31oct2012 13:51:10,04nov2012 00:00:00,31oct2012 14:42:27,2450,2450,1,1 
324,31oct2012 13:51:14,10000,20,,,,,,,,,,1919,2,2 
325,31oct2012 13:51:23,10000,20,,,,,,,,,,1919,2,2 
326,31oct2012 13:51:54,9000,20,,,161,9000,20,31oct2012 13:51:54,04nov2012 00:00:00,31oct2012 17:15:50,1919,1919,1,3 
327,31oct2012 13:52:05,10000,8,1725,1648,,,,,,,,1725,3,3 
328,31oct2012 13:52:05,10000,2,1725,1648,,,,,,,,1725,3,3 
329,31oct2012 13:52:39,9900,10,,,162,9900,10,31oct2012 13:52:39,04nov2012 00:00:00,31oct2012 13:53:16,277,277,1,1 
330,31oct2012 13:53:12,9700,10,,,163,9700,10,31oct2012 13:53:12,04nov2012 00:00:00,31oct2012 14:31:31,277,277,1,2 
+0

因此,你想在每個'股票'的任何給定'時間'找到最高出價?你能提供幾行數據嗎?生成測試數據的代碼的獎勵積分。 :) –

+0

@RichardHerron感謝您的關注,我剛剛添加了樣本數據 – z0nam

+0

然後'合併'它回來。 –

回答

3

以下應該工作。關鍵是要使用collapseif來查找符合條件的最大bid_price

/* make some data */ 
clear 
set seed 2001 
set obs 10 
generate stock = _n 
expand 100 
bysort stock: generate time = _n 
expand 100 
generate bid_time = time + 10*uniform() - 5 
generate bid_timelimit = time + 100*uniform() - 50 
generate bid_timecancelled = time + 100*uniform() - 50 
generate bid_price = 100 + 50*uniform() - 25 

/* find max active bid */ 
tempfile original_data 
save `original_data' 
collapse (max) bid_price /// 
    if (time > bid_time) & (time < min(bid_timelimit, bid_timecancelled)), /// 
    by(stock time) 
merge 1:m stock time using `original_data' 

/* check results */  
list in 1/10 
+0

謝謝您的建議,但我不應該摧毀原始表格,只需添加一個變量'maxbidprice'符合所有觀察結果的特定標準。我很抱歉,我沒有提到這一點。 – z0nam

+0

然後把它合併回去。 –

+0

我將你對我原來問題的回答的評論合併在一起。 – z0nam