2016-10-02 156 views
1

例如:加快嵌套ifelse語句 - 在我的代碼,這點R

time_elapsed      network_name    daypart  day 
1:   4705       Laff TV 2016-09-09 03:11:35 Friday 
2:   1800        CNN 2016-09-10 08:00:00 Saturday 
3:   23        INSP 2016-09-02 18:00:00 Friday 
4:   148        NBC 2016-09-02 16:01:26 Friday 
5:   957     History Channel 2016-09-07 14:44:03 Wednesday 
6:   1138   Nickelodeon/Nick-at-Nite 2016-09-09 16:00:00 Friday 
7:   120      Starz Edge 2016-09-07 15:28:59 Wednesday 
8:   268   Starz Encore Westerns 2016-09-07 17:13:05 Wednesday 
9:   6        CBS 2016-09-10 04:00:00 Saturday 
10:   69      Independent 2016-09-07 12:48:11 Wednesday 
11:   4151        NBC 2016-09-09 04:32:37 Friday 
12:   570 PBS: Public Broadcasting Service 2016-09-07 16:17:58 Wednesday 
13:   1421       NBCSN 2016-09-03 15:22:23 Saturday 
14:   466   Estrella TV (Broadcast) 2016-09-04 19:00:00 Sunday 

(一般超過200萬行)

我幾個月前寫了下面的嵌套ifelse語句時,我運行我的整個腳本經過短短几百萬行,但現在我運行它一個更大規模我真的想找到一個辦法讓它快一點。

targets_random$daypart <- ifelse((wday(targets_random$daypart) == 1 | 
       wday(targets_random$daypart) == 7), "W: Weekend", 
         ifelse(hour(targets_random$daypart) <= 2, "LP: Late Prime", 
         ifelse((hour(targets_random$daypart) >= 3 & 
       hour(targets_random$daypart) <= 5), "O: Overnight", 
         ifelse((hour(targets_random$daypart) >= 6 & 
       hour(targets_random$daypart) <= 9), "EM: Early Morning", 
         ifelse((hour(targets_random$daypart) >= 10 & 
       hour(targets_random$daypart) <= 16), "D: Day", 
         ifelse((hour(targets_random$daypart) >= 17 & 
       hour(targets_random$daypart) <= 20), "F: Fringe", 
         ifelse(hour(targets_random$daypart) >= 21, "P: Prime", NA))))))) 

我試圖用一個data.table解決方案,但只有非常稍快,而我的data.table到列表中。對於我的生活,我不明白爲什麼。這增加了足夠的時間來撤消它是不值得的節省。

任何建議將不勝感激。我有什麼工作,如果我必須堅持下去,它會沒事的。目前大約需要3.5小時才能完成整個代碼。最大的部分是SQL查詢和結果的文件創建,但如果我能儘可能地減少時間,這將是非常好的!

(一點題外話 - 它使用的是近8小時,然後我更換零件噸,與data.table語法我現在是一個官迷!)

+0

您可能可以使用parLapply一次運行多個行 – Rilcon42

+0

請參閱'?cut'。看來你可以使用類似'切(targets_random $時段每小時$,C(-Inf,3,6,10,17,21,天道酬勤),include.lowest = TRUE,右= FALSE)'但改變「標籤」以'C的說法( 「LP:已故總理」, 「O:隔夜」,等...)'和,之後用'代替 「W:週末」''任何地方(targets_random $時段$ wday + 1)%在%C(1,7)' –

回答

0

考慮建立一個獨立的,靜態daytimes所有可能組合的數據框及其結果。在SQL實踐中,這將被視爲查找表。然後定期合併完整的數據表。

# DF (N=168) 7 X 24 
daytimes <- expand.grid(wday=c(1:7), 
         hour=c(1:24))  
daytimes$result <- 
    ifelse((daytimes$wday == 1|daytimes$wday == 7), "W: Weekend", 
     ifelse(daytimes$hour <= 2, "LP: Late Prime", 
      ifelse((daytimes$hour >= 3 & daytimes$hour <= 5), "O: Overnight", 
        ifelse((daytimes$hour >= 6 & daytimes$hour <= 9), "EM: Early Morning", 
          ifelse((daytimes$hour >= 10 & daytimes$hour <= 16), "D: Day", 
            ifelse((daytimes$hour >= 17 & daytimes$hour <= 20), "F: Fringe", 
             ifelse(daytimes$hour >= 21, "P: Prime", NA))))))) 
# CREATE MERGE FIELDS 
targets_random$wday <- wday(targets_random$daypart) 
targets_random$hour <- hour(targets_random$daypart) 

# MERGE WITH NEW COLUMN: result 
targets_random <- merge(targets_random, daytimes, by=c("wday", "hour"))   
+0

天上我要嘗試! – Camille