使用mmap和popen

我需要讀入並處理一堆〜40mb gzip文本文件，並且我需要它快速完成並且I/O開銷最小（因爲其他人也使用這些卷）。我這樣找到了這個任務，最快的方式是這樣的：使用mmap和popen

def gziplines(fname): 
    f = Popen(['zcat', fname], stdout=PIPE) 
    for line in f.stdout: 
     yield line

然後：

for line in gziplines(filename) 
    dostuff(line)

，但我想做些什麼（？IF這是更快）的東西像這樣：

def gzipmmap(fname): 
    f = Popen(['zcat', fname], stdout=PIPE) 
    m = mmap.mmap(f.stdout.fileno(), 0, access=mmap.ACCESS_READ) 
    return m

可悲的是，當我嘗試這一點，我得到這個錯誤：

>>> m = mmap.mmap(f.stdout.fileno(), 0, access=mmap.ACCESS_READ) 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
mmap.error: [Errno 19] No such device

即使，當我嘗試：

>>> f.stdout.fileno() 
4

所以，我覺得我有什麼是怎麼回事的基本誤解。 :(

的兩個問題是：

1）請問這是MMAP在把整個文件到內存中進行處理的更快的方法？

2）我該如何做到這一點？

非常感謝你......這裏的每個人都已經非常有幫助！〜聶

來源

2011-06-28 Nik

無論如何，您的發生器解決方案比使用mmap更清晰。您是否嘗試過使用Python的標準gzip庫，而不是調用外部程序？ http://docs.python.org/library/gzip.html –

從mmap(2)手冊頁：

ENODEV The underlying file system of the specified file does not sup- 
      port memory mapping.

你不能的mmap流，唯一真正的文件或匿名交換空間。你需要自己從流中讀入內存。

來源

2011-06-28 17:48:31

謝謝。一年之後回到這裏......對mmap和其他所有東西有更好的理解！ – Nik

管道不可拆卸。

case MAP_PRIVATE: 
     ... 
if (!file->f_op || !file->f_op->mmap) 
     return -ENODEV;

和管道的文件操作不包含mmap掛鉤。

來源

2011-06-28 17:50:20 adobriyan

使用mmap和popen

回答

相關問題