2012-09-26 61 views
1

我在問這個問題,我昨天在這個帖子中詢問了一個問題:Random Forests for Variables selectionR特殊數據框

我設法找出每個季度最重要的技術交易規則。我已經構建了一個數據框來放置這些TTR的名稱。這是它,我有一個季度的一列。

   1  2  3  4  5  6  7  8  9  10   11 
1   RSI2 RSI3 RSI2 RSI10 RSI2 RSI2 RSI2 RSI2 RSI2 RSI2   RSI2 
2   RSI3 RSI4 RSI3 RSI20 RSI3 RSI3 RSI3 RSI4 RSI4 RSI3   RSI3 
3   RSI4 RSI5 RSI4 EMA5 RSI4 RSI4 RSI5 RSI5 RSI5 RSI4   RSI4 
4   RSI5 RSI10 RSI5 EMA20 RSI5 RSI5 RSI10 EMA5 RSI10 RSI5   RSI5 
5   RSI10 RSI20 RSI10 EMA60 SMA5 RSI10 RSI20 EMA20 RSI20 RSI10  RSI10 
6   SMA20 SMA60 RSI20 SMI  atr RSI20 SMA60 EMA60 SMA5 RSI20   SMA5 
7   SMA60 pctB SMA20 ADX pctB EMA5 atr  atr SMA60  atr  SMA20 
8   atr calcs.1 pctB pctB macd EMA20 pctB  ADX pctB  ADX  EMA20 
9   pctB <NA> <NA> macd myVolat EMA60 <NA> pctB macd pctB  EMA60 
10 myChaikinVol <NA> <NA> signal calcs.1 pctB <NA> macd signal myVolat   ADX 
11  myVolat <NA> <NA> calcs <NA> macd <NA> signal mySAR calcs.1   pctB 
12  calcs <NA> <NA> <NA> <NA> <NA> <NA> myVolat myVolat <NA> myChaikinVol 
13   <NA> <NA> <NA> <NA> <NA> <NA> <NA> calcs.1 <NA> <NA>  myVolat 
14   <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>  calcs 

我已經添加NA應付行的不同長度。現在

,我想回到我的數據集,看起來就像是:

  daily.returns  RSI2  RSI3  RSI4  RSI5 RSI10 RSI20  SMA5 SMA20 SMA60  EMA5 EMA20 EMA60  atr  SMI  ADX oscillator  pctB  macd  signal myChaikinVol mySAR myVolat  calcs calcs.1 
2009-01-07 -0.015587635 97.964071 92.62210 87.21605 82.40040 66.95642 55.19221 19720.64 18655.29 17758.68 2556.777 2556.777 2556.777 82.06602 27.52145 17.31637   85 0.87092366 0.5930649 -0.220581024 -0.3211637 2369.876 0.2325009 0.3169638 0.2801128 
2009-01-08 -0.008700162 43.766573 58.62387 62.97794 64.03382 60.23197 52.99739 19756.44 18666.60 17754.07 2566.499 2566.499 2566.499 80.33416 29.12141 16.86914   85 0.72197937 0.8929854 0.002132269 -0.3183377 2385.210 0.2201065 0.3169831 0.2654092 
2009-01-09 -0.011980596 27.182247 44.97072 52.29336 55.50633 56.74068 51.80171 19776.92 18674.31 17750.34 2523.372 2523.372 2523.372 78.65886 29.37878 15.90677   85 0.67025741 0.9349831 0.188702427 -0.2613410 2403.582 0.2245705 0.3119865 0.2608195 
2009-01-12 -0.014061295 13.371347 30.46561 39.97055 45.24210 52.16207 50.17764 19788.02 18683.05 17748.76 2524.466 2524.466 2524.466 78.58966 28.17871 14.80066   85 0.49082443 0.9958785 0.350137644 -0.2065359 2420.117 0.2217528 0.3128203 0.2615878 
2009-01-13 -0.016693272 6.141462 19.52298 29.30404 35.68593 47.25383 48.32987 19772.25 18693.01 17749.35 2488.165 2488.165 2488.165 76.08326 25.34705 13.96936   80 0.26923307 0.8855971 0.457229531 -0.1845331 2434.998 0.2223591 0.3103439 0.2609330 
2009-01-14 -0.047918393 2.712386 11.97834 20.69541 27.26891 42.10718 46.23469 19747.87 18694.16 17742.88 2449.353 2449.353 2449.353 75.42231 20.65686 13.99099   60 -0.01023467 0.6624063 0.498264880 -0.1131268 2445.040 0.2290943 0.3094655 0.2644883 

我想要做的是在時間把一個NA當TTR並不顯著。例如,如果RSI2 TTR在第一季度顯示不顯着,我想用NA s代替數值,但是如果RSI2在第五季度顯着,我想保留數值。

最後,我應該得到一個數據幀,其維度與初始數據幀相同。

有什麼想法?謝謝!

+0

如果您發佈代碼,您會得到更好的答案。嘗試存儲一個'list',而不是'data.frame'。 – Zach

回答

3

首先,你應該將你的規則存儲在一個列表中,而不是一個data.frame。這使您不必爲每個「規則列表」添加NAs,以使它們具有相同的長度,並且還允許您使用lapply來處理您的數據。

既然你沒有提供任何數據,我做了一些了:

#Load data 
set.seed(42) 
library(quantmod) 
getSymbols('SPY') 
SPY <- adjustOHLC(SPY) 
dat <- dailyReturn(Cl(SPY)) 

#Add some TTRs 
for (rule in c('RSI', 'SMA')){ 
    for (n in c(5, 10, 15, 20, 25)){ 
    newvar <- paste(rule, n, sep='_') 
    FUN <- get(rule) 
    dat <- cbind(dat, FUN(dat[,1], n=n)) 
    names(dat)[length(names(dat))] <- newvar 
    } 
} 
dat <- na.omit(dat) 
rulenames <- names(dat)[-1] 

請注意,這是一個xts對象,而不是data.frame。這一點很重要,因爲它使在Date格式的指數,而不是作爲一個字符向量比:

> dat[1:5, 1:5] 
      daily.returns RSI_5 RSI_10 RSI_15 RSI_20 
2007-02-08 -0.001308450 40.06379 46.99824 48.59484 49.11738 
2007-02-09 -0.007447249 26.65296 40.34267 44.35689 46.10753 
2007-02-12 -0.003404196 42.49883 45.94447 47.58264 48.30373 
2007-02-13 0.008434995 67.89045 58.59450 55.64932 54.07276 
2007-02-14 0.006567123 62.45177 56.28547 54.23836 53.08886 

我也做了一些TTRS使用每年

#Make a list of rules for each year 
library(lubridate) 
dat$Year <- year(index(dat)) 
uniqueYear <- sort(unique(dat$Year)) 
rulesList <- lapply(uniqueYear, function(x) rulenames[runif(length(rulenames))>.5]) 
names(rulesList) <- uniqueYear 

請注意,我ruleList爲是名副其實的列表:

> rulesList 
$`2007` 
[1] "RSI_5" "RSI_10" "RSI_20" "RSI_25" "SMA_5" "SMA_10" "SMA_20" "SMA_25" 

$`2008` 
[1] "RSI_10" "RSI_15" "SMA_5" "SMA_10" "SMA_25" 

$`2009` 
[1] "RSI_5" "RSI_15" "RSI_20" "SMA_5" "SMA_15" "SMA_25" 

$`2010` 
[1] "RSI_5" "RSI_10" "RSI_20" "SMA_5" "SMA_20" "SMA_25" 

$`2011` 
[1] "RSI_20" "SMA_5" "SMA_10" "SMA_15" "SMA_20" "SMA_25" 

$`2012` 
[1] "RSI_20" "SMA_5" "SMA_10" "SMA_25" 

現在,它只是通過每年循環,和子集的dat對象到適當的行(一年)的問題和列(TTRS):

#Apply the rules to each data.frame 
data.by.year <- lapply(uniqueYear, function(year){ 
    rule_subset <- rulesList[[as.character(year)]] 
    data_subset <- dat[dat$Year==year, rule_subset] 
}) 
names(data.by.year) <- uniqueYear 

data.by.year是,其中每個元素代表1年價值的數據,與所選擇的TTRS(長度爲6的)的列表。

> str(data.by.year[[1]]) 
An ‘xts’ object from 2007-02-08 to 2007-12-31 containing: 
    Data: num [1:226, 1:8] 40.1 26.7 42.5 67.9 62.5 ... 
- attr(*, "dimnames")=List of 2 
    ..$ : NULL 
    ..$ : chr [1:8] "RSI_5" "RSI_10" "RSI_20" "RSI_25" ... 
    Indexed by objects of class: [Date] TZ: 
    xts Attributes: 
List of 3 
$ tclass : chr "Date" 
$ tzone : chr "" 
$ na.action:Class 'omit' atomic [1:25] 1 2 3 4 5 6 7 8 9 10 ... 
    .. ..- attr(*, "index")= num [1:25] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 ... 
> 
+0

非常感謝Zach爲您提供的寶貴答案。 – marino89