2015-04-03 133 views
-1

我希望根據另一個變量(逗留)來計算溫度和溼度的2天和5天平均值。如果住院時間爲0(今日患者入院),則平均值應根據同一日期(第1天)和前一天(第2天)的值計算。同樣,對於停留1來說,平均值是從日期之前返回2和3的值計算出來的。對於所有保留8以上的值,2日平均值是從當前日期之前的第9日和第10日算起的。第0天停留的5天平均值取決於第1,2,3,4和第5天的值。下表顯示如何計算計算結果。根據另一列計算兩天和五天的平均值

需要輸出示例。

Date stay Temperature Humidity 2d temp 5d temp 
9-Mar-98 6 4.23  74.32  na  na 
10-Mar-98 1 5.16  70.33  2.12 na 
11-Mar-98 8 7.39  65.77  na  na 
14-Mar-98 3 6.63  66.46  6.27 3.35 
23-Mar-98 2 11.03  62.94  11.13 13.97 
24-Mar-98 10 10.87  57.35  10.09 8.78 
4-Apr-98 0 9.64  59.21  8.68 9.51 
5-Apr-98 5 9.70  88.30  16.14 13.81 

一些解釋:3月11日入場時間爲8,平均值設爲NA,因爲沒有可用的值。 3月14日有3天,2天的平均值是根據3月10日和11日的數值計算出來的。另一方面,4月5日停留5天,2天平均值是根據3月30日和31日的數值計算的(計算從5在當前日期之前的幾天)。

下表顯示了用於平均值計算從當前日期的每個的時間段保持低於發現

stay 2d average  5d averages 
    0  1,2   1,2,3,4,5 
    1  2,3   2,3,4,5,6 
    2  3,4   3,4,5,6,7 
    3  4,5   4,5,6,7,8 
    4  5,6   5,6,7,8,9 
    5  6,7   6,7,8,9,10 
    6  7,8   7,8,9,10,11 
    7  8,9   8,9,10,11,12 
    8  9,10   9,10,11,12,13 

樣本數據。

> dput(mydata) 
structure(list(date = structure(c(10294, 10295, 10296, 10297, 
10298, 10299, 10300, 10301, 10302, 10303, 10304, 10305, 10306, 
10307, 10308, 10309, 10310, 10311, 10312, 10313, 10314, 10315, 
10316, 10317, 10318, 10319, 10320, 10321, 10322, 10323, 10324 
), class = "Date"), stay = c(6, 1, 8, 11, 27, 3, 4, 5, 11, 13, 
2, 17, 26, 6, 2, 10, 5, 2, 11, 24, 8, 11, 2, 8, 7, 30, 0, 5, 
1, 2, 2), temperature = c(4.23000001907349, 5.15541648864746, 
7.38499999046326, 9.47041666507721, 7.61999988555908, 6.62625002861023, 
8.71875, 11.4608333110809, 11.2570832967758, 14.5691666603088, 
10.3120833337307, 11.1216666698456, 11.1420832872391, 11.241666674614, 
11.03125, 10.8691666722298, 12.4862499237061, 13.9341666698456, 
11.8995833396912, 12.3716666698456, 12.5091667175293, 16.3833332061768, 
15.8945832252502, 7.26666665077209, 7.0091667175293, 7.73125004768372, 
9.63833332061768, 9.7045833170414, 11.4941666126251, 11.1304166316986, 
11.3908333778381), humid = c(74.3199996948242, 70.3308334350586, 
65.7658309936523, 69.2799987792969, 83.1170806884766, 66.4599990844727, 
67.4225006103516, 85.7504196166992, 89.9520797729492, 65.2566680908203, 
43.3604164123535, 51.7508316040039, 54.6866683959961, 68.2958297729492, 
62.9420852661133, 57.3504180908203, 66.4137496948242, 57.6333351135254, 
78.9029159545898, 84.5666656494141, 84.2004165649414, 71.2779159545898, 
74.0320816040039, 65.2512512207031, 58.8224983215332, 62.4949989318848, 
59.2054176330566, 88.2983322143555, 71.2545852661133, 78.0783309936523, 
51.9004173278809)), datalabel = "", time.stamp = " 3 Apr 2015 22:09", .Names = c("date", 
"stay", "temperature", "humid"), formats = c("%dD_m_Y", "%9.0g", 
"%9.0g", "%9.0g"), types = c(255L, 255L, 255L, 255L), val.labels = c("", 
"", "", ""), var.labels = c("  ", "", "temp", "rh"), expansion.fields = list(
    c("_dta", "_lang_list", "default"), c("_dta", "_lang_c", 
    "default")), row.names = c("1", "2", "3", "4", "5", "6", 
"7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", 
"18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", 
"29", "30", "31"), version = 12L, class = "data.frame") 
+1

歡迎SO!請注意,該網站不是代碼分配器。要求爲您編寫代碼通常在這裏收到的不好。請抽出時間和問題以瞭解如何提出能夠產生有利答案的問題。 – 2015-04-03 20:51:13

+0

我會使用dplyr窗口函數'lead' /'lag' http://cran.r-project.org/web/packages/dplyr/vignettes/window-functions.html爲最後的溫度添加額外的colums一天,第二天......並使用'ifelse'來區分例如'mydata < - mydata%<%mutate(temp_last_day = lag(temperature),temp_second_last_day = lag(temp_last_day,...)'等等。但肯定有更優雅的解決方案 – ckluss 2015-04-03 20:59:55

+0

@ckluss我是R新手和示例的鏈接有點難以理解當我嘗試了你提供的代碼時,我得到了錯誤信息「找不到函數」%<%「」我錯過了什麼來運行代碼? – Harawe 2015-04-04 17:45:14

回答

2

你可以給你的預期輸出的一個例子,我不知道這是否是正確的

這是一個簡單的循環。對於每一行取停留值,加1,並增加兩個(或5),然後提取的temperatures對應於那些兩個指數的矢量,並且它們mean

mydata <- structure(list(date=structure(c(10294,10295,10296,10297,10298,10299,10300,10301,10302,10303,10304,10305,10306,10307,10308,10309,10310,10311,10312,10313,10314,10315,10316,10317,10318,10319,10320,10321,10322,10323,10324),class="Date"),stay=c(6,1,8,11,27,3,4,5,11,13,2,17,26,6,2,10,5,2,11,24,8,11,2,8,7,30,0,5,1,2,2),temperature=c(4.23000001907349,5.15541648864746,7.38499999046326,9.47041666507721,7.61999988555908,6.62625002861023,8.71875,11.4608333110809,11.2570832967758,14.5691666603088,10.3120833337307,11.1216666698456,11.1420832872391,11.241666674614,11.03125,10.8691666722298,12.4862499237061,13.9341666698456,11.8995833396912,12.3716666698456,12.5091667175293,16.3833332061768,15.8945832252502,7.26666665077209,7.0091667175293,7.73125004768372,9.63833332061768,9.7045833170414,11.4941666126251,11.1304166316986,11.3908333778381),humid=c(74.3199996948242,70.3308334350586,65.7658309936523,69.2799987792969,83.1170806884766,66.4599990844727,67.4225006103516,85.7504196166992,89.9520797729492,65.2566680908203,43.3604164123535,51.7508316040039,54.6866683959961,68.2958297729492,62.9420852661133,57.3504180908203,66.4137496948242,57.6333351135254,78.9029159545898,84.5666656494141,84.2004165649414,71.2779159545898,74.0320816040039,65.2512512207031,58.8224983215332,62.4949989318848,59.2054176330566,88.2983322143555,71.2545852661133,78.0783309936523,51.9004173278809)),datalabel="",time.stamp="3Apr201522:09",.Names=c("date","stay","temperature","humid"),formats=c("%dD_m_Y","%9.0g","%9.0g","%9.0g"),types=c(255L,255L,255L,255L),val.labels=c("","","",""),var.labels=c("","","temp","rh"),expansion.fields=list(c("_dta","_lang_list","default"),c("_dta","_lang_c","default")),row.names=c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31"),version=12L,class="data.frame") 

res <- lapply(mydata[, 'stay'], function(x) 
    c('two' = mean(mydata[(x + 1):(x + 2), 'temperature'], na.rm = TRUE), 
    'five' = mean(mydata[(x + 1):(x + 5), 'temperature'], na.rm = TRUE))) 

cbind(mydata, do.call('rbind', res)) 

#   date stay temperature humid  two  five 
# 1 1998-03-09 6 4.230000 74.32000 10.089792 11.263583 
# 2 1998-03-10 1 5.155416 70.33083 6.270208 7.251417 
# 3 1998-03-11 8 7.385000 65.76583 12.913125 11.680417 
# 4 1998-03-12 11 9.470417 69.28000 11.131875 11.081167 
# 5 1998-03-13 27 7.620000 83.11708 10.599375 10.930000 
# 6 1998-03-14 3 6.626250 66.46000 8.545208 8.779250 
# 7 1998-03-15 4 8.718750 67.42250 7.123125 9.136583 
# 8 1998-03-16 5 11.460833 85.75042 7.672500 10.526417 
# 9 1998-03-17 11 11.257083 89.95208 11.131875 11.081167 
# 10 1998-03-18 13 14.569167 65.25667 11.136458 11.912500 
# 11 1998-03-19 2 10.312083 43.36042 8.427708 7.964083 
# 12 1998-03-20 17 11.121667 51.75083 12.916875 13.419583 
# 13 1998-03-21 26 11.142083 54.68667 9.671458 10.671667 
# 14 1998-03-22 6 11.241667 68.29583 10.089792 11.263583 
# 15 1998-03-23 2 11.031250 62.94209 8.427708 7.964083 
# 16 1998-03-24 10 10.869167 57.35042 10.716875 10.969750 
# 17 1998-03-25 5 12.486250 66.41375 7.672500 10.526417 
# 18 1998-03-26 2 13.934167 57.63334 8.427708 7.964083 
# 19 1998-03-27 11 11.899583 78.90292 11.131875 11.081167 
# 20 1998-03-28 24 12.371667 84.56667 7.370208 9.115500 
# 21 1998-03-29 8 12.509167 84.20042 12.913125 11.680417 
# 22 1998-03-30 11 16.383333 71.27792 11.131875 11.081167 
# 23 1998-03-31 2 15.894583 74.03208 8.427708 7.964083 
# 24 1998-04-01 8 7.266667 65.25125 12.913125 11.680417 
# 25 1998-04-02 7 7.009167 58.82250 11.358958 11.744167 
# 26 1998-04-03 30 7.731250 62.49500 11.390833 11.390833 
# 27 1998-04-04 0 9.638333 59.20542 4.692708 6.772167 
# 28 1998-04-05 5 9.704583 88.29833 7.672500 10.526417 
# 29 1998-04-06 1 11.494167 71.25459 6.270208 7.251417 
# 30 1998-04-07 2 11.130417 78.07833 8.427708 7.964083 
# 31 1998-04-08 2 11.390833 51.90042 8.427708 7.964083 
+0

我編輯了這個問題並添加了例如,對於最後一行(4月8日)有2次停留的情況,2天的平均值爲11,49和9.7(這是3天和4天的總和)當前日期) – Harawe 2015-04-04 13:07:56