在R讀取二進制結構

我有一個簡單的二進制結構，有幾個重複的數據類型，我需要在R有效讀取。例如，整數icount，後面是結構{a integer, b real}，重複icount次。例如，考慮由Python編寫的這個簡單的文件：在R讀取二進制結構

# Python -- this is not my question, it just makes data for my question 
from struct import pack 
with open('foo.bin', 'wb') as fp: 
    icount = 123456 
    fp.write(pack('i', icount)) 
    for i in range(icount): 
     fp.write(pack('if', i, i * 100.0))

（你可以download this <1 MB file，如果你不希望產生的話）

要讀取該文件到[R，我可以使用readBin在for循環，但它是痛苦慢（如預期）：

# R 
fp <- file("foo.bin", "rb") 
icount <- readBin(fp, "integer", size=4) 
df <- data.frame(a=integer(icount), b=numeric(icount)) 
for (i in seq(icount)) { 
    df$a[i] <- readBin(fp, "integer", size=4) 
    df$b[i] <- readBin(fp, "numeric", size=4) 
} 
close(fp)

我想知道一個更有效的方法來的將非均勻二進制結構讀取爲data.frame結構（或類似結構）。如果可能，我知道應始終避免使用for-loops。

來源

2017-03-01 Mike T

我還沒有使用它，但'包:: unpack'宣稱能夠根據模板來解壓原始載體。 – r2evans

你能分享一下數據的樣子嗎？ – TUSHAr

@Tushar您可以生成它或[下載]（http://filebin.ca/3E2SWO2QJvHu/foo.bin） –

我發現了一個快速運行的解決方法，這是爲了讀出結構數據的整個塊作爲「原始」，然後切片的部件出來解釋結構。讓我證明：

fp <- file("foo.bin", "rb") 
icount <- readBin(fp, "integer", size=4) 
rec_size = 4 + 4 # int is 4 bytes + float is 4 bytes 
raw <- readBin(fp, "raw", n=icount * rec_size) 
close(fp) 

# Interpret raw bytes using specifically tailored slices for the structure 
raw_sel_a <- rep(0:icount, each=4) * rec_size + 1:4 
raw_sel_b <- rep(0:icount, each=4) * rec_size + 1:4 + 4 
df <- data.frame(
    a = readBin(raw[raw_sel_a], "integer", size=4, n=icount), 
    b = readBin(raw[raw_sel_b], "numeric", size=4, n=icount))

棘手的部分是製作raw_sel切片原料結構的相關部分閱讀。這個例子很簡單，因爲每個數據成員都是4個字節。不過，我可以想象，對於複雜的數據結構來說這更難。

來源

2017-03-01 22:40:42

作爲一個說明，你的循環了（只測試一次）：

user system elapsed 
174.04 1.55 180.96

我加快了讀了使用：

fp <- file("foo.bin", "rb") 
icount <- readBin(fp, "integer", size=4) 
df <- data.frame(a=integer(icount), b=numeric(icount)) 
x=replicate(icount*2,readBin(fp, "integer", size=4)) 
x=x[0:(icount-1)*2+1] 
close(fp) 
fp <- file("foo.bin", "rb") 
y=replicate(icount*2+1,readBin(fp, "numeric", size=4)) 
y=y[1:(icount)*2+1] 
df$a=x 
df$b=y 
close(fp)

這是速度比我預期：

user system elapsed 
3.08 0.10 3.18

來源

2017-03-01 09:18:45 Travis

在R讀取二進制結構

回答

相關問題