2014-02-26 65 views
0

此問題部分與此question有關。填寫滾動相關矩陣的缺失值

我的數據文件可以找到here。我使用2008年1月1日至2013年12月31日的樣本期。數據文件沒有缺失值。

以下代碼使用前一年價值的滾動窗口在2008年1月1日至2013年12月31日的每一天生成滾動相關矩陣。例如,2008年1月1日的AUTBEL之間的相關性使用2007年1月1日至2008年1月1日的一系列值計算,並且對於所有其他配對也是如此。

data work.rolling; 
set mm.rolling; 
run; 

%macro rollingCorrelations(inputDataset=, refDate=); 
/*first get a list of unique dates on or after the reference date*/ 
proc freq data = &inputDataset. noprint; 
where date >="&refDate."d; 
table date/out = dates(keep = date); 
run; 


/*for each date calculate what the window range is, here using a year's length*/ 
data dateRanges(drop = date); 
set dates end = endOfFile 
       nobs= numDates; 
format toDate fromDate date9.; 

toDate=date; 
fromDate = intnx('year', toDate, -1, 's'); 

call symputx(compress("toDate"!!_n_), put(toDate,date9.)); 
call symputx(compress("fromDate"!!_n_), put(fromDate, date9.)); 

/*find how many times(numberOfWindows) we need to iterate through*/ 
if endOfFile then do; 
call symputx("numberOfWindows", numDates); 
end; 

run; 
%do i = 1 %to &numberOfWindows.; 
/*create a temporary view which has the filtered data that is passed to PROC CORR*/ 
data windowedDataview/view = windowedDataview; 
set &inputDataset.; 
where date between "&&fromDate&i."d and "&&toDate&i."d; 
drop date; 
run; 
    /*the output dataset from each PROC CORR run will be 
correlation_DDMMMYYY<from date>_DDMMMYY<start date>*/ 
proc corr data = windowedDataview 
outp = correlations_&&fromDate&i.._&&toDate&i. (where=(_type_ = 'CORR')) 

     noprint; 
run; 

%end; 

/*append all datasets into a single table*/ 
data all_correlations; 
format from to date9.; 
set correlations_: 
    indsname = datasetname 
; 
from = input(substr(datasetname,19,9),date9.); 
to = input(substr(datasetname,29,9), date9.); 
run; 


%mend rollingCorrelations; 
%rollingCorrelations(inputDataset=rolling, refDate=01JAN2008) 

輸出的摘錄可以找到here

可以看出,第2行到第53行顯示了2008年4月1日的相關矩陣。然而,2009年4月1日的相關矩陣出現了問題:ALPHA有相關係數的缺失值,它的對。這是因爲如果查看數據文件,則從2008年4月1日到2009年4月1日的ALPHA的值都爲零,因此導致除以零。這種情況也會發生在其他一些數據值上,例如,HSBC也具有從08年4月1日到2009年4月1日0的所有值。

要解決此問題,我想知道上述代碼如何修改即在發生這種情況的情況下(即在2個特定日期之間所有值都爲0),則使用整個採樣週期簡單計算兩對數據值之間的相關性。例如,上缺少09年4月1日ALPHAAUT之間的相關性,因此該相關性應該使用的值從1 2008 JAN到2013年12月31日,而不是使用的值從08年4月1日至09年4月1日

+0

您是否擁有ETS授權? – Joe

+0

@Joe我不確定其實,我該如何檢查? – user3184733

+0

@ user3184733要檢查您已授權的產品,您可以運行以下過程來檢查許可證文件並將產品列表輸出到日誌中。然後簡單地做一個'CTRL + F'搜索'SAS/ETS'。 'PROC SETINIT;運行;' – 2014-02-27 10:23:31

回答

1

計算一旦運行上面的宏和已經拿到all_correlations數據集,你需要使用的所有數據即運行另一個PROC CORR這個時候,

/*first filter the data to be between "01JAN2008"d and "31DEC2013"d*/ 
data work.all_data_01JAN2008_31DEC2013; 
set mm.rolling; 
where date between "01JAN2008"d and "31DEC2013"d; 
drop date ; 
run; 

接着上面的數據集傳遞給PROC CORR

proc corr data = work.all_data_01JAN2008_31DEC2013 
outp = correlations_01JAN2008_31DEC2013 
(where=(_type_ = 'CORR')) 

     noprint; 
run; 
data correlations_01JAN2008_31DEC2013; 
length id 8; 
set correlations_01JAN2008_31DEC2013; 
/*add a column identifier to make sure the order of the correlation matrix is preserved when joined with other tables*/ 
id = _n_; 
run; 

您將得到一個由_name_列唯一的數據集。 然後,您將不得不加入correlations_01JAN2008_31DEC2013all_correlations,以便如果在all_correlations中缺少一個值,則會在其位置插入對應的值correlations_01JAN2008_31DEC2013。爲此,我們可以使用PROC SQL & COALESCE函數。

PROC SQL; 
CREATE TABLE MISSING_VALUES_IMPUTED AS 
SELECT 
A.FROM 
,A.TO 
,b.id 
,a._name_ 
,coalesce(a.AUT,b.AUT) as AUT 
,coalesce(a.BEL,b.BEL) as BEL 
,coalesce(a.DEN,b.DEN) as DEN 
,coalesce(a.FRA,b.FRA) as FRA 
,coalesce(a.GER,b.GER) as GER 
,coalesce(a.GRE,b.GRE) as GRE 
,coalesce(a.IRE,b.IRE) as IRE 
,coalesce(a.ITA,b.ITA) as ITA 
,coalesce(a.NOR,b.NOR) as NOR 
,coalesce(a.POR,b.POR) as POR 
,coalesce(a.SPA,b.SPA) as SPA 
,coalesce(a.SWE,b.SWE) as SWE 
,coalesce(a.NL,b.NL) as NL 
,coalesce(a.ERS,b.ERS) as ERS 
,coalesce(a.RZB,b.RZB) as RZB 
,coalesce(a.DEX,b.DEX) as DEX 
,coalesce(a.KBD,b.KBD) as KBD 
,coalesce(a.DAB,b.DAB) as DAB 
,coalesce(a.BNP,b.BNP) as BNP 
,coalesce(a.CRDA,b.CRDA) as CRDA 
,coalesce(a.KN,b.KN) as KN 
,coalesce(a.SGE,b.SGE) as SGE 
,coalesce(a.CBK,b.CBK) as CBK 
,coalesce(a.DBK,b.DBK) as DBK 
,coalesce(a.IKB,b.IKB) as IKB 
,coalesce(a.ALPHA,b.ALPHA) as ALPHA 
,coalesce(a.ALBK,b.ALBK) as ALBK 
,coalesce(a.IPM,b.IPM) as IPM 
,coalesce(a.BKIR,b.BKIR) as BKIR 
,coalesce(a.BMPS,b.BMPS) as BMPS 
,coalesce(a.PMI,b.PMI) as PMI 
,coalesce(a.PLO,b.PLO) as PLO 
,coalesce(a.BINS,b.BINS) as BINS 
,coalesce(a.MB,b.MB) as MB 
,coalesce(a.UC,b.UC) as UC 
,coalesce(a.BCP,b.BCP) as BCP 
,coalesce(a.BES,b.BES) as BES 
,coalesce(a.BBV,b.BBV) as BBV 
,coalesce(a.SCHSPS,b.SCHSPS) as SCHSPS 
,coalesce(a.NDA,b.NDA) as NDA 
,coalesce(a.SEA,b.SEA) as SEA 
,coalesce(a.SVK,b.SVK) as SVK 
,coalesce(a.SPAR,b.SPAR) as SPAR 
,coalesce(a.CSGN,b.CSGN) as CSGN 
,coalesce(a.UBSN,b.UBSN) as UBSN 
,coalesce(a.ING,b.ING) as ING 
,coalesce(a.SNS,b.SNS) as SNS 
,coalesce(a.BARC,b.BARC) as BARC 
,coalesce(a.HBOS,b.HBOS) as HBOS 
,coalesce(a.HSBC,b.HSBC) as HSBC 
,coalesce(a.LLOY,b.LLOY) as LLOY 
,coalesce(a.STANBS,b.STANBS) as STANBS 
from all_correlations as a 
inner join correlations_01JAN2008_31DEC2013 as b 
on a._name_ = b._name_ 
order by 
A.FROM 
,A.TO 
,b.id 
; 
quit; 
/*verify that no missing values are left. NMISS column should be 0 from all variables*/ 
proc means data = MISSING_VALUES_IMPUTED n nmiss; 
run; 
+0

謝謝你的巨大幫助。簡潔,易於遵循的答案! – user3184733