我試圖解決一個更大的問題,使用這個基本的例子。我需要應用基於which()
的位置的函數,因爲我需要知道df1
年的值,其中值爲NA
或> = 150。然後,我將子集df2
,得到平均值,並將其返回到確切的行。現在我正在使用一個for()
循環,因爲我擁有的數據非常大,所以需要更快的速度。有沒有一種常見的方法來做到這一點?將自定義函數應用到特定的行/列
dput:
df1 <- structure(list(id = c("USC00031632", "USC00031632", "USC00031632",
"USC00031632", "USC00031632", "USC00031632", "USC00031632", "USC00031632",
"USC00031632", "USC00031632"), element = c("TMAX", "TMIN", "TMAX",
"TMIN", "TMAX", "TMIN", "TMAX", "TMIN", "TMAX", "TMIN"), year = 1900:1909,
month = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5), day = c(1, 1, 1,
1, 1, 1, 1, 1, 1, 1), value = c(30.02, NA, 37.94, 10.94,
NA, 28.04, 64.94, 41, 200, 51.08)), .Names = c("id", "element",
"year", "month", "day", "value"), row.names = c(NA, -10L), class = c("tbl_df",
"data.frame"))
df2 <-structure(list(id = c("USC00031632", "USC00031632", "USC00031632",
"USC00031632", "USC00031632", "USC00031632", "USC00031632", "USC00031632",
"USC00031632", "USC00031632"), element = c("TMAX", "TMIN", "TMAX",
"TMIN", "TMAX", "TMIN", "TMAX", "TMIN", "TMAX", "TMIN"), year = 1900:1909,
month = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5), day = c(1, 1, 1,
1, 1, 1, 1, 1, 1, 1), value = c(30.02, 10.94, 37.94, 10.94,
12, 28.04, 64.94, 41, 82.04, 51.08)), row.names = c(NA, -10L
), class = c("tbl_df", "data.frame"), .Names = c("id", "element",
"year", "month", "day", "value"))
代碼:
library(dplyr)
check <- function(df, yr){
df_d <- filter(df, year == yr)
m <- mean(df_d$value)
return(m)
}
for (i in which(is.na(df1$value) | df1$value >= 150)){
df1[i,6] <- check(df = df2, yr = as.numeric(df1[i,3]))
}
我真的不明白,因爲'df1'和'df2'具有相同的值,只是'df2'沒有'NA's,你不能只使用'df2'來解決嗎?或者你的示例數據集非常糟糕。 –
@DavidArenburg該代碼提供了我試圖解決的問題。數據集只是樣本,沒有意義。這是我追求的編程問題。 – Vedda