2015-11-26 63 views
0

問題在改進循環使用其他方法

有1個主站(df)和堆疊在單個data.frame與值三天3個本地站(s)。我們的想法是每天從主站獲取三個本地站的相對異常,並使用phylin包中的反距離加權(IDW)對其進行平滑處理。然後通過乘法應用於主站中的value

關於改進此代碼的任何建議(例如data.table,dplyr,apply)?如果沒有繁瑣的for循環,我仍然不知道如何處理這個問題。

dput

s <- structure(list(id = c("USC00031152", "USC00034638", "USC00036352", 
"USC00031152", "USC00034638", "USC00036352", "USC00031152", "USC00034638", 
"USC00036352"), lat = c(33.59, 34.7392, 35.2833, 33.59, 34.7392, 
35.2833, 33.59, 34.7392, 35.2833), long = c(-92.8236, -90.7664, 
-93.1, -92.8236, -90.7664, -93.1, -92.8236, -90.7664, -93.1), 
    year = c(1900, 1900, 1900, 1900, 1900, 1900, 1900, 1900, 
    1900), month = c(1, 1, 1, 1, 1, 1, 1, 1, 1), day = c(1L, 
    1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), value = c(63.3157576809045, 
    86.0490598902219, 76.506386949066, 71.3760752788486, 89.9119576975542, 
    76.3535163951321, 53.7259645981243, 61.7989638892985, 85.8911224149051 
    )), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-9L), .Names = c("id", "lat", "long", "year", "month", "day", 
"value")) 

df <- structure(list(id = c(12345, 12345, 12345), lat = c(100, 100, 
100), long = c(50, 50, 50), year = c(1900, 1900, 1900), month = c(1, 
1, 1), day = 1:3, value = c(54.8780020601509, 106.966029162171, 
98.3198828955801)), row.names = c(NA, -3L), class = "data.frame", .Names = c("id", 
"lat", "long", "year", "month", "day", "value")) 

代碼

library(phylin) 

nearest <- function(i, loc){ 
    # Stack 3 local stations 
    stack <- s[loc:(loc+2),] 

    # Get 1 main station 
    station <- df[i,] 

    # Check for NA and build relative anomaly (r) 
    stack <- stack[!is.na(stack$value),] 
    stack$r <- stack$value/station$value 

    # Use IDW and return v 
    v <- as.numeric(ifelse(dim(stack)[1] == 1, 
        stack$r, 
        idw(stack$r, stack[,c(2,3,8)], station[,2:3]))) 
    return(v) 
} 


ncdc <- 1 

for (i in 1:nrow(df)){ 
    # Get relative anomaly from function 
    r <- nearest(i, ncdc) 

    # Get value from main station and apply anomaly 
    p <- df[i,7]    
    df[i,7] <- p*r 

    # Iterate to next 3 local stations 
    ncdc <- ncdc + 3 
} 

回答

1

假設你讓你最近的功能不變。 然後,您可以在df中獲得您的新值欄

newvalue <- sapply(1:NROW(df), function (i) df[i,7] * nearest(i, 3*(i-1)+1)) 
df$value <- newvalue 
+0

謝謝。這是'apply'函數的一個很好的用法,我想知道它是否可以並行以提高速度。 – Vedda

+0

對於並行版本的應用,您應該查看'multicore'軟件包中的'mcapply'。 –