2015-09-12 57 views
5

How to round floats to integers while preserving their sum?具有以僞代碼寫入的下面的answer,該向量將向量舍入爲整數值,以使元素總和保持不變並且舍入誤差最小化。我想在R.有效地實現這一點(即矢量如果可能)將數值向量的整數向量保留爲整數

例如,舍入這些號碼產生不同的總:

set.seed(1) 
(v <- 10 * runif(4)) 
# [1] 2.655087 3.721239 5.728534 9.082078 
(v <- c(v, 25 - sum(v))) 
# [1] 2.655087 3.721239 5.728534 9.082078 3.813063 
sum(v) 
# [1] 25 
sum(round(v)) 
# [1] 26 

複製的僞代碼從answer參考

// Temp array with same length as fn. 
tempArr = Array(fn.length) 

// Calculate the expected sum. 
arraySum = sum(fn) 

lowerSum = 0 
-- Populate temp array. 
for i = 1 to fn.lengthf 
    tempArr[i] = { result: floor(fn[i]),    // Lower bound 
        difference: fn[i] - floor(fn[i]), // Roundoff error 
        index: i }       // Original index 

    // Calculate the lower sum 
    lowerSum = lowerSum + tempArr[i] + lowerBound 
end for 

// Sort the temp array on the roundoff error 
sort(tempArr, "difference") 

// Now arraySum - lowerSum gives us the difference between sums of these 
// arrays. tempArr is ordered in such a way that the numbers closest to the 
// next one are at the top. 
difference = arraySum - lowerSum 

// Add 1 to those most likely to round up to the next number so that 
// the difference is nullified. 
for i = (tempArr.length - difference + 1) to tempArr.length 
    tempArr.result = tempArr.result + 1 
end for 

// Optionally sort the array based on the original index. 
array(sort, "index") 

回答

11

在一個更簡單的形式,我會說這個算法是:

  1. 從一切向下舍入開始
  2. 收小數部分最高的數字,直到達到所需的總和。

    1. floor
    2. 訂單號輪下跌的小數部分(使用order
    3. 使用tail搶指數:

    這可以在矢量化方式,通過實施中的R與k個最大小數部分,其中k是,我們需要增加的總和,達到我們的目標值

  3. 增量產值量元素在這些指數由1

在代碼:

smart.round <- function(x) { 
    y <- floor(x) 
    indices <- tail(order(x-y), round(sum(x)) - sum(y)) 
    y[indices] <- y[indices] + 1 
    y 
} 
v 
# [1] 2.655087 3.721239 5.728534 9.082078 3.813063 
sum(v) 
# [1] 25 
smart.round(v) 
# [1] 2 4 6 9 4 
sum(smart.round(v)) 
# [1] 25 
5

感謝這個實用的功能!我想補充的答案,如果四捨五入至小數點後的指定數量時,功能可以修改:

smart.round <- function(x, digits = 0) { 
    up <- 10^digits 
    x <- x * up 
    y <- floor(x) 
    indices <- tail(order(x-y), round(sum(x)) - sum(y)) 
    y[indices] <- y[indices] + 1 
    y/up 
} 
2

運行總計和差異爲基礎的方法相比smartRound更快的@josliber:

diffRound <- function(x) { 
    diff(c(0, round(cumsum(x)))) 
} 

以下結果如何比較以100萬的記錄(在這裏看到的細節:Running Rounding):

res <- microbenchmark(
    "diff(dww)" = x$diff.rounded <- diffRound(x$numbers) , 
    "smart(josliber)"= x$smart.rounded <- smartRound(x$numbers), 
    times = 100 
) 

benchmark of column rounding methods

Unit: milliseconds 
expr   min  lq  mean  median  uq  max  neval 
diff(dww)  38.79636 59.70858 100.6581 95.4304 128.226 240.3088 100 
smart(josliber) 466.06067 719.22723 966.6007 1106.2781 1177.523 1439.9360 100 
+2

儘管此解決方案比我的解決方案更快,並且保留了所需的總和,但不幸的是,它不會最小化問題中所要求的舍入誤差。例如,與'v < - C(2.655087,3.721239,5.728534,9.082078,3.813063)','的所述smart.round'舍入誤差,'總和(ABS(smart.round(ⅴ)-v))',返回1.474329,而'diffRound','sum(abs(diffRound(v)-v))'的舍入誤差返回1.606633。 – josliber

+0

這是事實,總體舍入誤差不會被diff方法最小化。我不確定這是否是該問題的強烈要求。 – Bulat

+0

原始問題有這個條款:「考慮到滿足這四個條件,使舍入方差(sum((in [i] - fn [i])^ 2))最小化的算法是可取的,但這不是一個大問題「。 – Bulat