2017-03-12 77 views
0

功能應該在所述第一和第99百分位數採取矢量和winsorize值(與第99百分位,反之亦然下替換值較大的第99百分位爲值比所述第一百分位數)。我可以在沒有任何錯誤的情況下運行該函數,但它不會更改作爲參數給出的向量。當我在函數外部運行相同的代碼時,它運行良好,但我必須爲data.frame中的許多列執行此操作,所以我希望能夠通過apply函數傳遞函數。功能到子集和調整矢量

wins <- function(vect, prob = c(0.01, 0.99)){ 
    #vect is a vector with values to be winsorized 
    #prob contains top and bottom percentiles at which to winsorize data in vect 

    low_quantile <- quantile(vect, probs = prob[1], na.rm = TRUE) 
    high_quantile <- quantile(vect, probs = prob[2], na.rm = TRUE) 

    vect[vect < low_quantile] <- low_quantile 
    vect[vect > high_quantile] <- high_quantile 
} 

有什麼建議嗎?

+0

你可能覺得事情在函數內部發生的神奇影響功能之外的對象。他們不。您需要顯式返回vect並將函數的結果分配給新對象或現有對象。 – joran

回答

1

在你的函數的末尾添加vect,使返回的最後一個元素。

wins <- function(vect, prob = c(0.01, 0.99)){ 
#vect is a vector with values to be winsorized 
#prob contains top and bottom percentiles at which to winsorize data in vect 

low_quantile <- quantile(vect, probs = prob[1], na.rm = TRUE) 
high_quantile <- quantile(vect, probs = prob[2], na.rm = TRUE) 

vect[vect < low_quantile] <- low_quantile 
vect[vect > high_quantile] <- high_quantile 
vect 
} 

wins(1:100) 
    [1] 1.99 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 11.00 12.00 13.00 14.00 15.00 16.00 17.00 18.00 
[19] 19.00 20.00 21.00 22.00 23.00 24.00 25.00 26.00 27.00 28.00 29.00 30.00 31.00 32.00 33.00 34.00 35.00 36.00 
[37] 37.00 38.00 39.00 40.00 41.00 42.00 43.00 44.00 45.00 46.00 47.00 48.00 49.00 50.00 51.00 52.00 53.00 54.00 
[55] 55.00 56.00 57.00 58.00 59.00 60.00 61.00 62.00 63.00 64.00 65.00 66.00 67.00 68.00 69.00 70.00 71.00 72.00 
[73] 73.00 74.00 75.00 76.00 77.00 78.00 79.00 80.00 81.00 82.00 83.00 84.00 85.00 86.00 87.00 88.00 89.00 90.00 
[91] 91.00 92.00 93.00 94.00 95.00 96.00 97.00 98.00 99.00 99.01 

編輯 如何將其應用到data.frame後續問題:

df1 <- data.frame(matrix(1:200,ncol=2)) 
apply(df1,2,wins) # apply by column 
> apply(df1,2,wins) 
      X1  X2 
    [1,] 1.99 101.99 
    [2,] 2.00 102.00 
    [3,] 3.00 103.00 
    [4,] 4.00 104.00 
    [5,] 5.00 105.00 
... 

,你把你的後續它還與一列工作:

wins(df1$X1) 
[1] 1.99 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 11.00 12.00 13.00 14.00 15.00 16.00 17.00 18.00 
[19] 19.00 20.00 21.00 22.00 23.00 24.00 25.00 26.00 27.00 28.00 29.00 30.00 31.00 32.00 33.00 34.00 35.00 36.00 
[37] 37.00 38.00 39.00 40.00 41.00 42.00 43.00 44.00 45.00 46.00 47.00 48.00 49.00 50.00 51.00 52.00 53.00 54.00 
[55] 55.00 56.00 57.00 58.00 59.00 60.00 61.00 62.00 63.00 64.00 65.00 66.00 67.00 68.00 69.00 70.00 71.00 72.00 
[73] 73.00 74.00 75.00 76.00 77.00 78.00 79.00 80.00 81.00 82.00 83.00 84.00 85.00 86.00 87.00 88.00 89.00 90.00 
[91] 91.00 92.00 93.00 94.00 95.00 96.00 97.00 98.00 99.00 99.01 
+0

謝謝你的回覆。出於某種原因,這隻有在我定義了一系列值並將其直接傳遞給它時才起作用。從數據框傳遞向量列時,它仍然不起作用。我有一個20 colums的數據框,所以當我通過勝利(數據幀$ rowname)它沒有任何期望打印原始行。 – claushojmark

+0

它適用於我使用'data.frame'和'apply'。看我的編輯。 –

+0

幾乎從不需要在data.frame上使用'apply(df,2,FUN)',而是使用'[lsv] apply'。 – thelatemail