2012-09-27 20 views
0

如果此前已解決此問題,請提前道歉,但我已嘗試查看所有與ddply,sapply和應用有關的問題,並且無法爲我的生活數出這一個...將特定函數應用於數據幀的所有行時出錯

我已經寫了一個函數countMonths,需要在結算週期中的日,月和總天數作爲參數,並返回結算週期的日曆月數是的一部分:

countMonths <- function(day, month, cycle.days) { 
    month.days <- c(31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31) 
    if (month < 1 | month > 12 | floor(month) != month) { 
    cat("Invalid month value, must be an integer from 1 to 12") 
    } else if (day < 1 | day > month.days[month]) { 
    cat("Invalid day value, must be between 1 and month.days[month]") 
    } else if (cycle.days < 0) { 
    cat("Invalid cycle.days value, must be >= 0") 
    } else { 
    nmonths <- 1 
    day.ct <- cycle.days - day 
    while (day.ct > 0) { 
     nmonths <- nmonths + 1 
     month <- ifelse(month == 1, 12, month - 1) # sets to previous month  
     day.ct <- day.ct - month.days[month] # subtracts days of previous month 
    } 
    nmonths 
    } 
} 

我想在含客戶,如計費記錄data.frame此功能適用於每一行

> head(cons2[-1],10) 
    kwh cycle.days read.date row.index year month day kwh.per.day 
1 381   29 2010-09-02   1 2010  9 2 13.137931 
2 280   32 2010-10-04   2 2010 10 4 8.750000 
3 282   29 2010-11-02   3 2010 11 2 9.724138 
4 330   34 2010-12-06   4 2010 12 6 9.705882 
5 371   30 2011-01-05   5 2011  1 5 12.366667 
6 405   30 2011-02-04   6 2011  2 4 13.500000 
7 441   32 2011-03-08   7 2011  3 8 13.781250 
8 290   29 2011-04-06   8 2011  4 6 10.000000 
9 296   29 2011-05-05   9 2011  5 5 10.206897 
10 378   32 2011-06-06  10 2011  6 6 11.812500 

> dput(head(cons2[-1],10)) 
structure(list(kwh = c(381L, 280L, 282L, 330L, 371L, 405L, 441L, 
290L, 296L, 378L), cycle.days = c(29L, 32L, 29L, 34L, 30L, 30L, 
32L, 29L, 29L, 32L), read.date = structure(c(1283385600, 1286150400, 
1288656000, 1291593600, 1294185600, 1296777600, 1299542400, 1302048000, 
1304553600, 1307318400), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    row.index = 1:10, year = c(2010, 2010, 2010, 2010, 2011, 
    2011, 2011, 2011, 2011, 2011), month = c(9, 10, 11, 12, 1, 
    2, 3, 4, 5, 6), day = c(2L, 4L, 2L, 6L, 5L, 4L, 8L, 6L, 5L, 
    6L), kwh.per.day = c(13.1379310344828, 8.75, 9.72413793103448, 
    9.70588235294118, 12.3666666666667, 13.5, 13.78125, 10, 10.2068965517241, 
    11.8125)), .Names = c("kwh", "cycle.days", "read.date", "row.index", 
"year", "month", "day", "kwh.per.day"), row.names = c(NA, 10L 
), class = "data.frame") 

我嘗試了幾個選項,但都沒有效果。具體而言,我需要能夠通過一個給定的var值作爲標量(或長度爲1的向量),用於在數據幀中的每一行,但是它們總是得到作爲矢量傳遞:

> cons2$tot.months <- countMonths(cons2$day, cons2$month, cons2$cycle.days) 
Warning messages: 
1: In if (month < 1 | month > 12 | floor(month) != month) { : 
    the condition has length > 1 and only the first element will be used 
2: In if (day < 1 | day > month.days[month]) { : 
    the condition has length > 1 and only the first element will be used 
3: In if (cycle.days < 0) { : 
    the condition has length > 1 and only the first element will be used 
4: In while (day.ct > 0) { : 
    the condition has length > 1 and only the first element will be used 
5: In while (day.ct > 0) { : 
    the condition has length > 1 and only the first element will be used 

我終於能夠使用ddply,處理每一行作爲自己的組,以獲得正確的結果,但它需要很長的時間:

cons2 <- ddply(cons2, .(account, year, month, day), transform, 
       tot.months = countMonths(day, month, cycle.days) 
) 

是否有此功能適用於我的數據幀的每一行更好的辦法?或者,作爲一個相關的問題,我如何將數據框中的變量作爲標量參數(來自給定行的值)而不是該數據框中該var的所有值的向量傳遞?我特別感謝,如果有人能夠在我的想法中指出我出錯的概念。

+0

如果你使用'dput(head(cons2 [-1],10))',那將非常方便,所以我們可以剪切並粘貼到我們的會話中。 – nograpes

+0

@nograpes:完成。感謝您的建議。 –

+0

此外,[這個問題](http://stackoverflow.com/questions/1995933/number-of-months-between-two-dates)不回答您的具體問題,但它解決了計算日曆月之間的問題日期。我不確定將此標記爲重複是否有幫助? – nograpes

回答

1

要使該功能起作用,您可以使用mapply,它將連續將您的函數應用於您傳遞給它的所有向量的每個元素。所以你可以這樣做:

mapply(countMonths,cons2$day,cons2$month,cons2$cycle.days) 

有更簡單的方法來做到這一點,就像我在我的評論中提到的那樣。舉例來說,我認爲這會工作:

cons2$read.date=as.Date(cons2$read.date) 
monnb <- function(d){ lt <- as.POSIXlt(as.Date(d, origin="1900-01-01")); lt$year*12 + lt$mon } 
mondf <- function(d1, d2) monnb(d2) - monnb(d1) 
mondf(cons2$read.date-cons2$cycle.days,cons2$read.date) + 1 

另外,我注意到,你試圖抓住所有在您的功能將無法正常工作,這是偉大的條件!有一個叫stopifnot非常方便的功能,這將有助於這一目的:

countMonths <- function(day, month, cycle.days) { 
    month.days <- c(31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31) 
    stopifnot(month >=1 & month <= 12 & floor(month)==month & cycle.days >=0 & day >= 1 & day <= month.days[month]) 
    nmonths <- 1 
    day.ct <- cycle.days - day 
    while (day.ct > 0) { 
    nmonths <- nmonths + 1 
    month <- ifelse(month == 1, 12, month - 1) # sets to previous month  
    day.ct <- day.ct - month.days[month] # subtracts days of previous month 
    } 
    nmonths 
} 

至於你的函數的意見,我認爲它的工作原理,但它並沒有利用矢量運算的R的我從另一個答案中得到的函數非常靈活,因爲它可以一次性爲它提供一個完整的日期向量,而不是依次循環遍歷每個向量。

+0

工作,謝謝!對於我自己的學習,你能描述一下你看到的功能問題嗎? –

相關問題