2013-12-19 33 views
1

我有具有日期時間戳的位置數據。這些地點應該定期收集,但並非總是如此。我需要提取時間窗口內的那些位置。所以,例如,相隔12個小時的地點。如果我從位置1的日期時間開始,找到12小時後的下一個位置。如果沒有一個正好12個小時,那麼第二個最接近新的指定時間。然後採取新的位置,並在12小時內找到下一個。我必須爲每個唯一ID進行此操作。基於日期時間差或窗口的子集數據

COLLAR_ID     dt 
2159 2006-01-27 13:02:55 
2159 2006-01-27 14:01:12 
2159 2006-01-27 15:01:04 
2159 2006-01-27 16:01:09 

是什麼數據看起來像這裏是你可以剪切和粘貼的一小段數據。注意:它是所有相同的ID,我有不同的起始日期5點不同的ID /次

structure(list(COLLAR_ID = c(2159L, 2159L, 2159L, 2159L, 2159L, 
2159L, 2159L, 2159L, 2159L, 2159L, 2159L, 2159L, 2159L, 2159L, 
2159L, 2159L, 2159L, 2159L, 2159L, 2159L), dt = structure(c(1138366975, 
1138370472, 1138374064, 1138377669, 1138381264, 1138384873, 1138388503, 
1138399312, 1138402842, 1138406507, 1138413700, 1138417261, 1138420848, 
1138424444, 1138428071, 1138431695, 1138435287, 1138438938, 1138442428, 
1138446098), class = c("POSIXct", "POSIXt"), tzone = "GMT")), .Names = c("COLLAR_ID", 
"dt"), class = "data.frame", row.names = c(NA, 20L)) 

所以我認爲,從數據。例如,如果我的開始日期是2006-01-27 00:00: 00時間,那麼它應該記錄的下一個位置是在12:00:00 - 但是這個位置不存在,所以它應該記錄13:02:55。但即使如此,這也是2分鐘內嚴格的1小時緩衝時間窗口。

我曾想過將日期時間轉換爲Julian十進制數以便更容易處理,但我不知道該怎麼做。將日期/時間舍入到幾個小時就沒關係,除了有時在1小時的時間間隔內有2或3個位置,所以我需要在那些與原始啓動「最接近的時間」的位置中進行選擇。

因此,添加新的細節可能會使事情變得更加令人困惑 - 一些數據最初每隔1小時收集一次,然後3周後它會切換到12小時。但是,我不知道每個人都應該切換的編程時間。其他人從12點開始00:00:00開始,但每隔1小時切換一次,然後在幾天後切換到12小時 - 但又不知道切換的時間。所以,它可能會從下午2點開始轉爲12小時。

我想看看this stack overflow conversation,但看不到如何工作。所以,這是我下面的嘗試,我現在已經從原始發佈的問題更新了。這是行不通的。我仍在努力......它仍然看起來相當笨拙的代碼。

test2 = test2[order(test2$COLLAR_ID,test2$dt),] 
test2$dt <- as.POSIXct(strptime((test2$dt), "%Y-%m-%d %H:%M:%S"), tz="GMT") 
MinInterval = 12 #minimum time interval (in hours) between consecutive locations 
row = 0   # Keeps track of row within alldata 
Endtest2 = 2     #keeps track of row within individual within all data 
SubData1 = test2[1,] 
IDNames = levels(as.factor(test2$COLLAR_ID)) 
test22 = data.frame() 

for (n in 1:length(IDNames)){ 
    IndivData = test2[test2$COLLAR_ID==IDNames[n],] 
    row = row+1    #Continues to track next row between individuals 
    Endtest2 = 2    #restarts counting the rows for NEXT individual 
    SubData1[row,]=IndivData[1,] 

    while (Endtest2<nrow(IndivData)){ 
    timediff = difftime(IndivData$dt[Endtest2],SubData1$dt[row],units = "hours") 

    if (timediff>MinInterval){   #If time difference is greater than 47 hours then do 
     row = row+1 
     SubData1=rbind(SubData1,IndivData[Endtest2,]) 
     Endtest2 = Endtest2+1     
    } else{ 
     Endtest2 = Endtest2+1 
    } 
    } #end while loop 

} #end loop through individuals 
test22 =SubData1 
} #end conditional to subset data 

我道歉,不好意思地說,我完全忘了我這個posted a question(使用類似的代碼)長回來,但從來沒有得到任何解決方案。我已經放棄了整個努力,但現在正在用新數據(更多混沌數據)和新需求重新審視它。該腳本不會過濾出正確的數據。

+1

答案是否適合您?它似乎確實如此。如果是這樣,請將其標記爲已回答。 – BrodieG

回答

0

使用您提供的數據集(我在您的結構中創建了一個名爲temp的對象),這就是我想到的。此代碼將爲每次觀察創建12小時的郵票,最後在每個12小時的窗口中首先觀察第一次觀察,在第一次觀察之後放棄所有觀察。

# create an xts object, I just find them easier to work with 
xts_object<-xts(temp$COLLAR_ID, order.by=temp$dt) 

# extract time and floor to 12 hours 
time<-temp$dt 
time_numeric<-as.numeric(time) 
# 43200 is the number of seconds in 12 hours 
floored_time<-c(floor(time_numeric/43200)*43200) 
floored_time<-as.POSIXct(floored_time, origin="1970-01-01 00:00:00") 

# create a new xts object with the floored index 
floored_xts_object<-xts(xts_object, order.by=floored_time) 

# drop double time stamps, leaving just the first observation in those 12 hours 
unique_xts_object<-make.index.unique(floored_xts_object, drop=T) 

隨意嘗試天花板來代替。希望這可以幫助。

我已經添加了一些代碼來選擇時間戳與唯一12小時的最小時間差,保留原始時間戳,返回一個POSIXct對象與這些時間戳的最小時差12小時。

# make floored times unique 
unique_time<-unique(floored_time) 

# use difftime in lapply to get time differences for each unique time to all time stamps 
time_diffrences<-lapply(unique_time, difftime, time) 
small<-lapply(time_diffrences, abs) 
small<-as.data.frame(small) 
names(small)<-NULL 

# get back into an xts object of time differences 
small<-xts(small, order.by=time) 
# using apply on the xts object, find the minimum for each unique time, selecting with 
# with which, and just extracting the index instead of the entire array 
smallest<-index(small[arrayInd(which(as.array(small)%in% apply(small, 2, min), arr.ind=T), dim(small))[,1]]) 

這個可以讓你挑選那些時間戳從您的XTS數據

# select from your original xts_object those line 
selected<-xts_object[smallest] 

最佳, 本

2

data.table使用漂亮的roll功能,你可以得到的最接近時間戳到午夜/中午:

# Make data (hourly time stamps +- random noise with 30 min standard dev) 

len <- 30 # Days 
stamps <- seq(as.POSIXct("2013-12-01"), by="-1 hour", length.out=len*12) + rnorm(len*12, 0, 1800) 
stamps.target <- seq(as.POSIXct("2013-12-01"), by="-12 hour", length.out=len) 

# Use data table to join stamps.target (midnight/noon) to stamps (hourly w/ noise) 

library(data.table) 
dt.data <- data.table(stamps, closest.match=stamps, key="stamps") 
dt.target <- data.table(stamps.target) 
dt.data[dt.target, roll="nearest"] 

#     stamps  closest.match 
# 1: 2013-12-01 00:00:00 2013-12-01 00:24:20 
# 2: 2013-11-30 12:00:00 2013-11-30 11:57:10 
# 3: 2013-11-30 00:00:00 2013-11-29 23:41:29 
# 4: 2013-11-29 12:00:00 2013-11-29 11:39:32 
# 5: 2013-11-29 00:00:00 2013-11-28 23:31:32 
# .... 

編輯:溶液與多個套環

儘管下面是一個代碼公平位,大部分是生成數據。實際工作真的只是最後三行:

# Make data (hourly time stamps +- random noise with 30 min standard dev) 

len <- 30 # number of 12 hour intervals 
pets <- c("fido", "rosie", "felix") 
start.date <- as.POSIXct("2013-12-01") 

# Create random roughly 1 hour apart time stamps for 
# our pets and store in data table. 

library(data.table) 
stamps.data <- 
    do.call(
    rbind, 
    lapply(
     pets, 
     function(x) { 
     data.table(
      pet=rep(x, len * 12), 
      stamp.join=seq(
      start.date, 
      by="-1 hour", 
      length.out=len*12 
     ) + rnorm(len*12, 0, 1800) 
) })) 
# The above looks complicated, but just creates our 
# data, a 3 column data table with roughly hourly time 
# stamps for each pet: 
#   pet   stamp.join 
# 1: rosie 2013-11-16 01:16:32 
# 2: fido 2013-11-16 01:24:28 
# 3: felix 2013-11-16 01:24:40 
# 4: fido 2013-11-16 01:50:54 
# 5: rosie 2013-11-16 02:33:49 
# ---       
# 1076: felix 2013-11-30 22:50:22 
# 1077: rosie 2013-11-30 23:10:52 
# 1078: felix 2013-11-30 23:52:32 
# 1079: fido 2013-12-01 00:24:01 
# 1080: rosie 2013-12-01 00:34:36 

# Now add a copy of stamp.join to the data table; necessary 
# because we will lose the stamp.join column in the join 

stamps.data[, closest.match:=stamp.join] 

# Now, for each pet, create a data.table with the target 
# times (CJ does a cartesian join of our pets and our target 
# times vectors and returns a data table, this is necessary 
# because we are doing a rolling join, if it was an exact 
# join we wouldn't need to CJ with pets, could just use 
# target stamps) 

stamps.target <- CJ(pets, seq(as.POSIXct("2013-12-01"), by="-12 hour", length.out=len)) 
setkey(stamps.data, pet, stamp.join) # join on pet and stamp.join 

# Use data table to join stamps.target (midnight/noon) to stamps (hourly w/ noise) 

stamps.data[stamps.target, roll="nearest"][order(stamp.join)] 

#  pet   stamp.join  closest.match 
# 1: felix 2013-11-16 12:00:00 2013-11-16 12:03:31 
# 2: fido 2013-11-16 12:00:00 2013-11-16 12:20:55 
# 3: rosie 2013-11-16 12:00:00 2013-11-16 11:36:37 
# 4: felix 2013-11-17 00:00:00 2013-11-17 00:01:48 
# 5: fido 2013-11-17 00:00:00 2013-11-17 00:12:11 
# 6: rosie 2013-11-17 00:00:00 2013-11-16 23:47:56 
# ---- 
+0

非常不錯的功能**滾**,你只是簡化了一半的代碼,歡呼 –

+0

所以我一直試圖讓這個工作,但問題是,在我的例子中我用00:00:作爲原始的「目標」爲如果我真的知道目標時間。目標時間對於每個人和變化都是不同的。我會嘗試在例子中解釋。 – Kerry

+0

@BrodieG您能告訴我如何將獨特的個人融入您的代碼中嗎?我認爲你的代碼是有效的,但它不能解釋具有相同日期的多個人 - 它仍然只是最接近的人。我嘗試過:'dt.test <-dt.data [dt.target,roll =「nearest」,by = COLLAR_ID]'但是這不起作用,因爲我將key設置爲date_time(dt)變量。 – Kerry