我正在使用SAS來處理大型數據集(> 20gb)。當我運行一個DATA步驟時,雖然我用相同的變量對數據集進行了排序,但我收到了「BY變量未正確排序......」。當我跑了PROC SORT再次,SAS甚至說「輸入數據集已經排序,沒有進行排序」 我的代碼是:「BY變量未正確排序」錯誤雖然它已被排序
proc sort data=output.TAQ;
by market ric date miliseconds descending type order;
run;
options nomprint;
data markers (keep=market ric date miliseconds type order);
set output.TAQ;
by market ric date;
if first.date;
* ie do the following once per stock-day;
* Make 1-second markers;
/*Type="AMARK"; Order=0; * Set order to zero to ensure that markers get placed before trades and quotes that occur at the same milisecond;
do i=((9*60*60)+(30*60)) to (16*60*60); miliseconds=i*1000; output; end;*/
run;
和錯誤信息是:
ERROR: BY variables are not properly sorted on data set OUTPUT.TAQ.
RIC=CXR.CCP Date=20160914 Time=13:47:18.125 Type=Quote Price=. Volume=. BidPrice=9.03 BidSize=400
AskPrice=9.04 AskSize=100 Qualifiers= order=116458952 Miliseconds=49638125 exchange=CCP market=1
FIRST.market=0 LAST.market=0 FIRST.RIC=0 LAST.RIC=0 FIRST.Date=0 LAST.Date=1 i=. _ERROR_=1
_N_=43297873
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 43297874 observations read from the data set OUTPUT.TAQ.
WARNING: The data set WORK.MARKERS may be incomplete. When this step was stopped there were
56770826 observations and 6 variables.
WARNING: Data set WORK.MARKERS was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 1:14.21
cpu time 26.71 seconds
你在日誌中收到一條錯誤消息,當您運行PROC排序? – user667489
絕對需要查看更多的日誌。觀察計數非常奇怪 - 你有'如果first.date',所以標記應該是output.taq的一個子集,但是在處理停止的地方,已經從output.taq和〜56.8中讀取了〜43.3m obs m已被寫入__ work_markers ... – keydemographic
@keydemographic在循環內有一個輸出語句,所以obs計數可能會做各種各樣的事情。 – user667489