我正在嘗試使用大於2^32的數字。雖然我也在使用data.table和fread,但我不認爲問題與他們有關。我可以在不改變data.table或使用fread的情況下打開和關閉它們的症狀。我的症狀是,當我期待正確的指數1e + 3到1e + 17時,我得到4.1e-302的報告平均值。R,bit64,計算data.table中的行平均值和標準差的問題。表
使用bit64軟件包和與integer64相關的函數時,問題一致出現。事情在「規則大小的數據和R」中適用於我,但我沒有在這個包中正確表達事情。看到我的代碼和數據如下。
我在MacBook Pro,16GB,i7(更新)。
我重新啓動了我的R會話並清除了工作區,但問題始終存在。
請提出建議,我很欣賞這個輸入。我認爲它必須使用庫,bit64。
鏈接我看着包括 bit64 doc
An issue that had similar symptoms caused by an fread() memory leak, but I think I eliminated
這裏是我的輸入數據
var1,var2,var3,var4,var5,var6,expected_row_mean,expected_row_stddev
1000 ,993 ,987 ,1005 ,986 ,1003 ,996 ,8
100000 ,101040 ,97901 ,100318 ,96914 ,97451 ,98937 ,1722
10000000 ,9972997 ,9602778 ,9160554 ,8843583 ,8688500 ,9378069 ,565637
1000000000 ,1013849241 ,973896894 ,990440721 ,1030267777 ,1032689982 ,1006857436 ,23096234
100000000000 ,103171209097 ,103660949260 ,102360301140 ,103662297222 ,106399064194 ,103208970152 ,2078732545
10000000000000 ,9557954451905 ,9241065464713 ,9357562691674 ,9376495364909 ,9014072235909 ,9424525034852 ,334034298683
1000000000000000 ,985333546044881 ,994067361457872 ,1034392968759970 ,1057553099903410 ,1018695335152490 ,1015007051886440 ,27363415718203
100000000000000000 ,98733768902499600 ,103316759127969000 ,108062824583319000 ,111332326225036000 ,108671041505404000 ,105019453390705000 ,5100048567944390
我的代碼,這個樣本的工作數據
# file: problem_bit64.R
# OBJECTIVE: Using larger numbers, I want to calculate a row mean and row standard deviation
# ERROR: I don't know what I am doing wrong to get such errors, seems bit64 related
# PRIORITY: BLOCKED (do this in Python instead?)
# reported Sat 9/24/2016 by Greg
# sample data:
# each row is 100 times larger on average, for 8 rows, starting with 1,000
# for the vars within a row, there is 10% uniform random variation. B2 = ROUND(A2+A2*0.1*(RAND()-0.5),0)
# Install development version of data.table --> for fwrite()
install.packages("data.table", repos = "https://Rdatatable.github.io/data.table", type = "source")
require(data.table)
require(bit64)
.Machine$integer.max # 2147483647 Is this an issue ?
.Machine$double.xmax # 1.797693e+308 I assume not
# -------------------------------------------------------------------
# ---- read in and basic info that works
csv_in <- "problem_bit64.csv"
dt <- fread(csv_in)
dim(dt) # 6 8
lapply(dt, class) # "integer64" for all 8
names(dt) # "var1" "var2" "var3" "var4" "var5" "var6" "expected_row_mean" "expected_row_stddev"
dtin <- dt[, 1:6, with=FALSE] # just save the 6 input columns
...現在的問題在於室溫
# -------------------------------------------------------------------
# ---- CALCULATION PROBLEMS START HERE
# ---- for each row, I want to calculate the mean and standard deviation
a <- apply(dtin, 1, mean.integer64); a # get 8 values like 4.9e-321
b <- apply(dtin, 2, mean.integer64); b # get 6 values like 8.0e-308
# ---- try secondary variations that do not work
c <- apply(dtin, 1, mean); c # get 8 values like 4.9e-321
c <- apply(dtin, 1, mean.integer64); c # same result
c <- apply(dtin, 1, function(x) mean(x)); c # same
c <- apply(dtin, 1, function(x) sum(x)/length(x)); c # same results as mean(x)
##### I don't see any sd.integer64 # FEATURE REQUEST, Z-TRANSFORM IS COMMON
c <- apply(dtin, 1, function(x) sd(x)); c # unrealistic values - see expected
常規尺寸R於普通數據,仍然使用數據讀入用fread()成data.table() - WORKS
# -------------------------------------------------------------------
# ---- delete big numbers, and try regular stuff - WHICH WORKS
dtin2 <- dtin[ 1:3, ] # just up to about 10 million (SAME DATA, SAME FREAD, SAME DATA.TABLE)
dtin2[ , var1 := as.integer(var1) ] # I know there are fancier ways to do this
dtin2[ , var2 := as.integer(var2) ] # but I want things to work before getting fancy.
dtin2[ , var3 := as.integer(var3) ]
dtin2[ , var4 := as.integer(var4) ]
dtin2[ , var5 := as.integer(var5) ]
dtin2[ , var6 := as.integer(var6) ]
lapply(dtin2, class) # validation
c <- apply(dtin2, 1, mean); c # get 3 row values AS EXPECTED (matching expected columns)
c <- apply(dtin2, 1, function(x) mean(x)); c # CORRECT
c <- apply(dtin2, 1, function(x) sum(x)/length(x)); c # same results as mean(x)
c <- apply(dtin2, 1, sd); c # get 3 row values AS EXPECTED (matching expected columns)
c <- apply(dtin2, 1, function(x) sd(x)); c # CORRECT
您是否嘗試過其他大數字的替代品,比如'Brobdingnag'?他們可能不會很好地使用data.table,但你並沒有真正使用data.table特殊功能。你甚至可以用'data.table = FALSE'來使用fread來獲取數據幀。 – dracodoc