2015-09-15 105 views
3

我正試圖分解我以前的問題,並制定了一個計劃,以實現我最終尋找的不同步驟。目前,我正在嘗試循環查找是否爲每個唯一源打開機械系統,如source列中的第一個表中所示。循環遍歷列的唯一值並創建多列

例如,我給出了以下配置文件,告訴我在四個季節中每個系統的典型工作日是幾小時。請注意,有些來源在一天之內有超過一個時段,因此您可以看到堆棧2重複了2個時段。

enter image description here

什麼我想現在要實現的是,我已經創造了一些樣品的日期,想通過每一個獨特的來源做一個循環,只是說無論是特定的時間系統或關閉基於關於Profile表中提供的信息。到目前爲止,我所做的是創建如下表與下面的代碼:

enter image description here

和下面的代碼將創建上表:

# create dates table 
dates =data.frame(dates=seq(
    from=as.POSIXct("2010-1-1 0:00", tz="UTC"), 
    to=as.POSIXct("2012-12-31 23:00", tz="UTC"), 
    by="hour")) 

# add year month day hour weekday column 

dates$year <- format(dates[,1], "%Y") # year 
dates$month <- format(dates[,1], "%m") # month 
dates$day <- format(dates[,1], "%d") # day 
dates$hour <- format(dates[,1], "%H") # hour 
dates$weekday <- format(dates[,1], "%a") # weekday 

# set system locale for reproducibility 

Sys.setlocale(category = "LC_TIME", locale = "en_US.UTF-8") 

# calculate season column 

d = function(month_day) which(lut$month_day == month_day) 
lut <- data.frame(all_dates = as.POSIXct("2012-1-1") + ((0:365) * 3600 * 24), 
        season = NA) 
lut <- within(lut, { month_day = strftime(all_dates, "%b-%d") }) 

lut[c(d("Jan-01"):d("Mar-15"), d("Nov-08"):d("Dec-31")), "season"] = "winter" 
lut[c(d("Mar-16"):d("Apr-30")), "season"] = "spring" 
lut[c(d("May-01"):d("Sep-27")), "season"] = "summer" 
lut[c(d("Sep-28"):d("Nov-07")), "season"] = "autumn" 
rownames(lut) = lut$month_day 

dates = within(dates, { 
    season = lut[strftime(dates, "%b-%d"), "season"] 
}) 

什麼我想現在要做的是在profile表中的列中的每個唯一值的右側添加列,並基於以下標準對數據集中每個小時開啓或關閉系統進行估計。

我很努力的編程概念,如何做到與多個條件相似的vlookup,並在新列中粘貼值。例如,對於我的樣本數據,循環應創建2個程序,因爲Source列只有2個唯一源Stack 1Stack 2。棘手的一點是if語句和它需要類似的東西:

作爲一個例子,表2的第一行應該匹配季節列的值與profile表,並查看該小時是否在期間內在系統啓動時的特定季節。如果它在規定的時間內落入,則說'開',如果在外面只說off。所以結果應該像下圖的這2個紅色字體列:

一個例子的冬日: enter image description here

一個例子日春: enter image description here 我設法得到的獨特價值具有以下代碼的列:

values <- unique(profile$Source) 

但現在它只是沒有進一步使用for循環。

我只是想知道如果有人可以給我任何建議,我怎麼可以做循環,以創建2個更多的列與表2獨特的來源?

下面是典型週刊「個人資料」的數據,我現在用表:

> dput(profile) 
structure(list(`Source no` = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Source = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L), .Label = c("Stack 1", "Stack 2"), class = "factor"), 
    Period = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Day = structure(c(2L, 
    6L, 7L, 5L, 1L, 3L, 4L, 2L, 6L, 7L, 5L, 1L, 3L, 4L, 2L, 6L, 
    7L, 5L, 1L, 3L, 4L), .Label = c("Fri", "Mon", "Sat", "Sun", 
    "Thu", "Tue", "Wed"), class = "factor"), `Spring On` = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 15L, 
    15L, 15L, 15L, 15L, 15L, 15L), `Spring Off` = c(23L, 23L, 
    23L, 23L, 23L, 23L, 23L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 18L, 
    18L, 18L, 18L, 18L, 18L, 18L), `Summer On` = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L), .Label = "off", class = "factor"), `Summer Off` = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L), .Label = "off", class = "factor"), `Autumn On` = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L), .Label = "off", class = "factor"), `Autumn Off` = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L), .Label = "off", class = "factor"), `Winter On` = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L), .Label = c("0", "off"), class = "factor"), 
    `Winter Off` = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("23", 
    "off"), class = "factor")), .Names = c("Source no", "Source", 
"Period", "Day", "Spring On", "Spring Off", "Summer On", "Summer Off", 
"Autumn On", "Autumn Off", "Winter On", "Winter Off"), class = "data.frame", row.names = c(NA, 
-21L)) 

千恩萬謝

+1

你的設置代碼不起作用。檢查這行代碼'日期= data.frame(日期= seq(as.Date('2010-01-01'),as.Date('2012-12-31'),= =「小時」))' –

+0

道歉我把錯誤的代碼,請看看現在更正的一個,謝謝'日期=數據。frame(dates = seq(from = as.POSIXct(「2010-1-1 0:00」,tz =「UTC」),to = as.POSIXct(「2012-12-31 23:00」,tz =「 UTC「),by =」hour「))' – Achak

+1

這條線是什麼意思? ''如果小時與'季節'欄中的值相同,那麼查看一週中的小時,星期幾,並返回系統是否開機。「' –

回答

6

爲了實現數據從profile所需轉移到dates,你將不得不轉換profile數據,然後將其與dates加入。對於以下步驟,我使用了data.table包。

1)裝入data.table包,改造成數據集data.tables(其增強dataframes):在數據集中profile

library(data.table) 

setDT(profile) 
setDT(dates) 

2)重新格式化的值:

# set the 'off' values to NA 
profile[profile=="off"] <- NA 
# make sure that all the remaining values are numeric (which wasn't the case) 
profile <- profile[, lapply(.SD, as.character), by=.(Source,Period,Day)][, lapply(.SD, as.numeric), by=.(Source,Period,Day)] 

3)爲的eachhour一個(或兩個)創建用於與值中的每個數據集的季節的是on。我只是做它冬春季節,因爲夏季,秋季只有off/NA值(我們會處理這些更高版本):

pr.spring <- profile[, .(season = "spring", 
         hour = c(`Spring On`:(`Spring Off`-1))), 
        by=.(Source,Period,Day)] 
pr.winter <- profile[!is.na(`Winter On`), .(season = "winter", 
              hour = c(`Winter On`:(`Winter Off`-1))), 
        by=.(Source,Period,Day)] 

請注意,我用Spring Off - 1。那是因爲我認爲這個堆棧在23:00關閉。通過使用-1我包括了第22小時,但不包括第23小時。如果需要,您可以更改此設置。

4)綁定步驟3中的數據集一起,並準備一個dcast操作所得到的數據集:

prof <- rbindlist(list(pr.spring,pr.winter)) 
prof <- prof[, .(weekday = Day, season, Source = gsub(" ",".",Source), hour = sprintf("%02d",hour))] 

5)將來自步驟4變換數據集到一個數據集的每個協議棧的列並將weekday列更改爲字符。是需要的,後者爲在後面的步驟中加入的操作,因爲在dates數據集中的weekday字段也是字符列:

profw <- dcast(prof, weekday + season + hour ~ Source, value.var = "hour", fun.aggregate = length, fill = 0) 
profw[, weekday := as.character(weekday)] 

6)加入兩個數據集在一起,並與0填補缺失值' S(remeber我說:「我們會處理那些後來的」步驟3):

dates.new <- profw[dates, on=c("weekday", "season", "hour")][is.na(Stack.1), `:=` (Stack.1 = 0, Stack.2 = 0)] 

得到的數據集現在已經在dates數據集中的每個日期堆疊列在其中1 ="on"0 = "off"


從得到的數據集的快照:

> dates.new[weekday=="Fri" & hour=="03" & month %in% c("03","04","09")] 
    weekday season hour Stack.1 Stack.2    dates year month day 
1:  Fri winter 03  1  1 2010-03-05 03:00:00 2010 03 05 
2:  Fri winter 03  1  1 2010-03-12 03:00:00 2010 03 12 
3:  Fri spring 03  1  0 2010-03-19 03:00:00 2010 03 19 
4:  Fri spring 03  1  0 2010-03-26 03:00:00 2010 03 26 
5:  Fri spring 03  1  0 2010-04-02 03:00:00 2010 04 02 
6:  Fri spring 03  1  0 2010-04-09 03:00:00 2010 04 09 
7:  Fri spring 03  1  0 2010-04-16 03:00:00 2010 04 16 
8:  Fri spring 03  1  0 2010-04-23 03:00:00 2010 04 23 
9:  Fri spring 03  1  0 2010-04-30 03:00:00 2010 04 30 
10:  Fri summer 03  0  0 2010-09-03 03:00:00 2010 09 03 
11:  Fri summer 03  0  0 2010-09-10 03:00:00 2010 09 10 
12:  Fri summer 03  0  0 2010-09-17 03:00:00 2010 09 17 
13:  Fri summer 03  0  0 2010-09-24 03:00:00 2010 09 24 
14:  Fri winter 03  1  1 2011-03-04 03:00:00 2011 03 04 
15:  Fri winter 03  1  1 2011-03-11 03:00:00 2011 03 11 
16:  Fri spring 03  1  0 2011-03-18 03:00:00 2011 03 18 
17:  Fri spring 03  1  0 2011-03-25 03:00:00 2011 03 25 
18:  Fri spring 03  1  0 2011-04-01 03:00:00 2011 04 01 
19:  Fri spring 03  1  0 2011-04-08 03:00:00 2011 04 08 
20:  Fri spring 03  1  0 2011-04-15 03:00:00 2011 04 15 
21:  Fri spring 03  1  0 2011-04-22 03:00:00 2011 04 22 
22:  Fri spring 03  1  0 2011-04-29 03:00:00 2011 04 29 
23:  Fri summer 03  0  0 2011-09-02 03:00:00 2011 09 02 
24:  Fri summer 03  0  0 2011-09-09 03:00:00 2011 09 09 
25:  Fri summer 03  0  0 2011-09-16 03:00:00 2011 09 16 
26:  Fri summer 03  0  0 2011-09-23 03:00:00 2011 09 23 
27:  Fri autumn 03  0  0 2011-09-30 03:00:00 2011 09 30 
28:  Fri winter 03  1  1 2012-03-02 03:00:00 2012 03 02 
29:  Fri winter 03  1  1 2012-03-09 03:00:00 2012 03 09 
30:  Fri spring 03  1  0 2012-03-16 03:00:00 2012 03 16 
31:  Fri spring 03  1  0 2012-03-23 03:00:00 2012 03 23 
32:  Fri spring 03  1  0 2012-03-30 03:00:00 2012 03 30 
33:  Fri spring 03  1  0 2012-04-06 03:00:00 2012 04 06 
34:  Fri spring 03  1  0 2012-04-13 03:00:00 2012 04 13 
35:  Fri spring 03  1  0 2012-04-20 03:00:00 2012 04 20 
36:  Fri spring 03  1  0 2012-04-27 03:00:00 2012 04 27 
37:  Fri summer 03  0  0 2012-09-07 03:00:00 2012 09 07 
38:  Fri summer 03  0  0 2012-09-14 03:00:00 2012 09 14 
39:  Fri summer 03  0  0 2012-09-21 03:00:00 2012 09 21 
40:  Fri autumn 03  0  0 2012-09-28 03:00:00 2012 09 28 
+1

Hi @ Jaap,這對我來說很明確,我對如何接近我的主要數據集有非常好的想法,這些數據集有更多的來源和年份。但是,要理解如何處理,一個好的開始。再次感謝 – Achak