運行有關在數據幀的行數循環，它永遠

-1

elecOrGas<-function(myData) 
{ 
    for (i in 1:(nrow(myData)-1)) 
    { 
    if (myData[i,2]==myData[i+1,2]) 

    { 
     if ((myData$typeGas[i]==myData$typeElec[i+1])|(myData$typeElec[i]==myData$typeGas[i+1])) 
     { 
     myData$typeTest[i]=1 
     } else { myData$typeTest[i]=0} 
    } else { myData$typeTest[i]=0} 
    } 
    return(myData) 
}

combo4數據幀由4列在下面格式〜800K行

CUSTID typeGas typeElec typeTest 
12456 1  0   0 
12563 1  0   1 
12563 0  1   0 
12455 0  1   0

當我運行功能elecOrGas(combo4)。它永遠需要運行代碼。我想我在這裏做錯了事。請協助。

來源

2016-03-10 aseem bhartiya

你能描述一下你的循環試圖做什麼嗎？ – agenis

因爲它是重複的，所以客戶ID'12455'不應該爲'typeTest'獲得'1'嗎？ –

什麼是您的數據維度？添加一些調試語句（每10或100行左右，嘗試'message'）來查看它是否正在計算。 –

這是一個使用dplyr的解決方案，它非常適合處理這類問題。我創建了一些模擬的數據匹配你的例子：

library(dplyr) 

## fake test data set 
combo.test <- data.frame(
    CUSTID = sample(rep(10000:999999, each=2), 800000, replace = F), 
    typeGas = sample(c(0,1), 800000, replace = T) 
) 
combo.test$typeElec <- ifelse(combo.test$typeGas == 0, 1, 0)

要分配「1」 typeTest如果客戶1兩種typeElec和typeGas在（可能）不同行，您使用dplyr「GROUP_BY」功能，循環在data.frame中的每個不同的CUSTID上，然後「mutate」創建一個新的變量「typeTest」。「ifelse」測試該CUSTID的typeElec或typeGas列中的「any」值是否爲1。

# convert to tbl_df object, arrange by CUSTID, assign 1 to variable typeTest 
# if CUSTID has values for 1 in both typeGas and typeElec 
ptm <- proc.time() 
combo.test <- combo.test %>% tbl_df() %>% arrange(CUSTID) %>% 
    group_by(CUSTID) %>% 
    mutate(typeTest = ifelse(any(typeGas == 1) & any(typeElec == 1), 1, 0)) %>% 
    ungroup() 
proc.time() - ptm

「tbl_df（）」的data.frame轉換爲一個很好的dplyr版本，並且管「％>％」運算符表示從每個函數的輸出被傳遞到下一個。代碼花了約10秒爲我跑。

https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html

UPDATE：對，我應該已經回答了你原來的問題，而不是給人一種替代方法。你的函數只有一個bug（第3行應該爲CUSTID索引第1列而不是第2列）。速度問題與R處理矢量與數據幀的效率有關。這是一個很好的討論：（Speed up the loop operation in R）。

elecOrGas2 <-function(myData) { 
    res <- numeric(nrow(myData)) # initialize a vector for 'typeTest' 

    for (i in 1:(nrow(myData)-1)) { 
     #if (myData[i,2]==myData[i+1,2]) 
     if (myData[i,1]==myData[i+1,1]) { # correct index for CUSTID 
      if ((myData$typeGas[i]==myData$typeElec[i+1])| 
        (myData$typeElec[i]==myData$typeGas[i+1])) { 
       res[i] <- 1 # use 
       #myData$typeTest[i]=1 
      } else { 
       res[i]=0 
      } 
     } else { 
      res[i]=0 
     } 
    } 
    myData$typeTest <- res 
    return(myData) 
} 

library(dplyr) 
combo.test <- data.frame(
    CUSTID = sample(rep(10000:999999, each=2), 800000, replace = F), 
    typeGas = sample(c(0,1), 800000, replace = T) 
) 
combo.test$typeElec <- ifelse(combo.test$typeGas == 0, 1, 0)  
combo.test <- arrange(combo.test, CUSTID) %>% tbl_df() 

# test time using 1/10 of the data 
# original function: 29 sec 
system.time(elecOrGas(combo.test[1:80000,]) -> test1) 
# updated vectorized function: 6 sec 
system.time(elecOrGas2(combo.test[1:80000,]) -> test2)

來源

2016-03-10 21:28:21

非常感謝。這是一個非常有效的解決方案。不過，由於我是新手，我想知道我的代碼有什麼問題。 –

非常感謝。我非常欣賞這個解決方案。小小的錯誤花費寶貴。 –

運行有關在數據幀的行數循環，它永遠

回答

相關問題