2016-05-08 62 views
0

我有一個數據庫,其中包含許多人(可能)一次運行一個服務的多個訂閱,以及每個事件在訂閱期間的事務數據。我正在嘗試創建一個變量來計算用戶在給定交易時間內當前有效訂閱的數量。併發訂閱計數

一個例子來說明,我的數據存在於形式:

person | subscription | obs_date | sub_start_date | sub_end_date | num_concurrent_subs 
-------------------------------------------------------------------------------------- 
1  | 1   | 09/01/10 | 09/01/10  | 09/01/11  | 1 
1  | 1   | 10/01/10 | 09/01/10  | 09/01/11  | 2 
1  | 1   | 11/01/10 | 09/01/10  | 09/01/11  | 2 
1  | 2   | 10/01/10 | 10/01/10  | 09/01/11  | 2 
1  | 2   | 11/01/10 | 10/01/10  | 09/01/11  | 2 
1  | 3   | 11/01/14 | 09/01/14  | .   | 1 
1  | 3   | 11/01/16 | 09/01/14  | .   | 1 
1  | 4   | 11/01/15 | 10/01/15  | 11/01/15  | 3 
1  | 5   | 11/01/15 | 10/01/15  | 11/01/15  | 3 

等等等等每個人。我想要像上面那樣生成num_concurrent_subs

也就是說,對於每個人,請查看每個觀察結果並查找它落入sub_start_datesub_end_date範圍內的訂閱次數。

我讀過Stata的count函數,並相信我接近解決方案,但我不確定如何在不同的訂閱中檢查它。

+0

嚴格'計數'是一個命令,而不是一個函數。在Stata中,命令和函數是不同類型的野獸。 –

回答

1

您可以通過將交易數據中的訂閱信息分開並將訂閱數據轉換爲長格式來完成此操作,其中一次觀察開始日期,另一次觀察結束日期。然後,您重新組合交易數據並通過單個日期變量進行訂單。您使用onoff變量來跟蹤每個訂閱的開始和結束。例如:

* Example generated by -dataex-. To install: ssc install dataex 
clear 
input byte(person subscription) str8(obs_date sub_start_date sub_end_date) byte num_concurrent_subs 
1 1 "09/01/10" "09/01/10" "09/01/11" 1 
1 1 "10/01/10" "09/01/10" "09/01/11" 2 
1 1 "11/01/10" "09/01/10" "09/01/11" 2 
1 2 "10/01/10" "10/01/10" "09/01/11" 2 
1 2 "11/01/10" "10/01/10" "09/01/11" 2 
1 3 "11/01/14" "09/01/14" "."  1 
1 3 "11/01/16" "09/01/14" "."  1 
1 4 "11/01/15" "10/01/15" "11/01/15" 3 
1 5 "11/01/15" "10/01/15" "11/01/15" 3 
end 

* should always have an observation identifier 
gen obsid = _n 

* convert string to Stata numeric dates 
gen odate = daily(obs_date,"MD20Y") 
gen substart = daily(sub_start_date,"MD20Y") 
gen subend = daily(sub_end_date,"MD20Y") 
format %td odate substart subend 
save "main_data.dta", replace 

* reduce to subscription info with one obs for the start and one obs 
* for the end of each subscription. use an onoff variable to tract 
* start and end events 
keep person subscription substart subend 
bysort person subscription substart subend: keep if _n == 1 
expand 2 
bysort person subscription: gen adate = cond(_n == 1, substart, subend) 
by person subscription: gen onoff = cond(_n == 1, 1, -1) 
replace onoff = 0 if mi(adate) 
format %td adate 

append using "main_data.dta" 

* include obs date in adate and nothing happens on the observation date 
replace adate = odate if !mi(obsid) 
replace onoff = 0 if !mi(obsid) 

* order by person adate, put on event first, then obs events, then off events 
gsort person adate -onoff 
by person: gen concur = sum(onoff) 

* return to original obs 
keep if !mi(obsid) 
sort obsid 
+0

討論'expand 2'技巧在http://www.stata-journal.com/sjpdf.html?articlenum=dm0068 –

1

這是另一種使用rangejoin(來自SSC)的方法。要在Stata的命令窗口安裝它,鍵入:

ssc install rangejoin 

隨着rangejoin,您可以配對每個訂閱與訂閱的開始和結束日期內的所有落在交易數據。然後,根據每次交易觀察,這只是一個計數問題,它與多少訂閱配對。

* Example generated by -dataex-. To install: ssc install dataex 
clear 
input byte(person subscription) str8(obs_date sub_start_date sub_end_date) byte num_concurrent_subs 
1 1 "09/01/10" "09/01/10" "09/01/11" 1 
1 1 "10/01/10" "09/01/10" "09/01/11" 2 
1 1 "11/01/10" "09/01/10" "09/01/11" 2 
1 2 "10/01/10" "10/01/10" "09/01/11" 2 
1 2 "11/01/10" "10/01/10" "09/01/11" 2 
1 3 "11/01/14" "09/01/14" "."  1 
1 3 "11/01/16" "09/01/14" "."  1 
1 4 "11/01/15" "10/01/15" "11/01/15" 3 
1 5 "11/01/15" "10/01/15" "11/01/15" 3 
end 

* should always have an observation identifier 
gen obsid = _n 

* convert string to Stata numeric dates 
gen odate = daily(obs_date,"MD20Y") 
gen substart = daily(sub_start_date,"MD20Y") 
gen subend = daily(sub_end_date,"MD20Y") 
format %td odate substart subend 
save "main_data.dta", replace 

* reduce to subscription start and end date per person 
bysort person subscription substart subend: keep if _n == 1 
keep person substart subend 

* missing values will exclude obs so use a date in the future 
replace subend = mdy(1,1,2099) if mi(subend) 

* pair each subscription with an obs date 
rangejoin odate substart subend using "main_data.dta", by(person) 

* the number of current subcription is the number of pairings 
bysort obsid: gen current = _N 

* return to original obs 
by obsid: keep if _n == 1 
sort obsid 
drop substart subend 
rename (substart_U subend_U) (substart subend)