Python - 打開並更改大型文本文件

我有一個〜600MB的Roblox類型.mesh文件，它在任何文本編輯器中都像文本文件一樣讀取。下面我有以下代碼：Python - 打開並更改大型文本文件

mesh = open("file.mesh", "r").read() 
mesh = mesh.replace("[", "{").replace("]", "}").replace("}{", "},{") 
mesh = "{"+mesh+"}" 
f = open("p2t.txt", "w") 
f.write(mesh)

它返回：

Traceback (most recent call last): 
    File "C:\TheDirectoryToMyFile\p2t2.py", line 2, in <module> 
    mesh = mesh.replace("[", "{").replace("]", "}").replace("}{", "},{") 
MemoryError

這裏是我的文件的樣本：

[-0.00599, 0.001466, 0.006][0.16903, 0.84515, 0.50709][0.00000, 0.00000, 0][-0.00598, 0.001472, 0.00599][0.09943, 0.79220, 0.60211][0.00000, 0.00000, 0]

我能做些什麼？

編輯：

我不知道什麼頭，跟着，和尾命令是在這個標記爲重複的其他線程。我試圖使用它，但無法使它工作。該文件也是一個巨大的線，它不分成線。

來源

2015-06-22 GShocked

嘗試做替換的一次一個。嘗試閱讀一些教程。 – wwii

這並沒有工作 – GShocked

可能的重複[在Python中讀取大文本文件，一行一行地將其加載到內存中]（http://stackoverflow.com/questions/6475328/read-large-text-files-in -python-line-by-line-without-loading -in-to-memory） –

您需要閱讀每次迭代一咬牙，分析它，然後寫入到另一個文件或sys.stdout。試試這個代碼：

mesh = open("file.mesh", "r") 
mesh_out = open("file-1.mesh", "w") 

c = mesh.read(1) 

if c: 
    mesh_out.write("{") 
else: 
    exit(0) 
while True: 
    c = mesh.read(1) 
    if c == "": 
     break 

    if c == "[": 
     mesh_out.write(",{") 
    elif c == "]": 
     mesh_out.write("}") 
    else: 
     mesh_out.write©

UPD：

它的工作原理很慢（感謝jamylak）。所以我改變了它：

import sys 
import re 


def process_char(c, stream, is_first=False): 
    if c == '': 
     return False 
    if c == '[': 
     stream.write('{' if is_first else ',{') 
     return True 
    if c == ']': 
     stream.write('}') 
     return True 


def process_file(fname): 
    with open(fname, "r") as mesh: 
     c = mesh.read(1) 
     if c == '': 
      return 
     sys.stdout.write('{') 

     while True: 
      c = mesh.read(8192) 
      if c == '': 
       return 

      c = re.sub(r'\[', ',{', c) 
      c = re.sub(r'\]', '}', c) 
      sys.stdout.write(c) 


if __name__ == '__main__': 
    process_file(sys.argv[1])

所以現在它的工作~15秒1.4G文件。要運行它：

$ python mesh.py file.mesh > file-1.mesh

來源

2015-06-22 03:57:40

很好。另請參閱此問題http://stackoverflow.com/questions/2872381/how-to-read-a-file-byte-by-byte-in-python-and-how-to-print-a-bytelist-as- a-binar – maxymoo

使用'''with'''語句在*上下文管理器中工作*可能是一個好主意。 ''''mesh_out'''應該打開* *附加* – wwii

雖然每次讀取'1'字節是超慢的。你應該使用例如緩衝區大小。默認'8192'並在每個塊上運行'.replace（）' – jamylak

您可以通過線做線：

mesh = open("file.mesh", "r") 
with open("p2t.txt", "w") as f: 
    for line in mesh: 
     line= line.replace("[", "{").replace("]", "}").replace("}{", "},{") 
     line = "{"+line +"}" 
     f.write(line)

來源

2015-06-22 03:51:20 maxymoo

仍然是內存錯誤，也許我需要更多內存？我有8GB，但我的一個棍棒失敗，我現在只有4GB – GShocked

現在試試，這應該遍歷行 – maxymoo

仍然有一個內存錯誤 – GShocked

import os 
f = open('p2f.txt','w') 
with open("file.mesh") as mesh: 
    while True: 
    c = mesh.read(1) 
    if not c: 
     f.seek(-1,os.SEEK_END) 
     f.truncate() 
     break 
    elif c == '[': 
     f.write('{') 
    elif c == ']': 
     f.write('},') 
    else: 
     f.write(c)

p2f.txt：

{-0.00599, 0.001466, 0.006},{0.16903, 0.84515, 0.50709},{0.00000, 0.00000, 0},{-0.00598, 0.001472, 0.00599},{0.09943, 0.79220, 0.60211},{0.00000, 0.00000, 0}

來源

2015-06-22 03:51:56

'1如前所述，字節一次超慢。你應該讀一個更大的緩衝區大小。如果你不相信我查看Linus torvalds所說的內容 – jamylak

@jamylak我同意，但我試圖避免MemoryError :) –

沒有錯，內存一次只能容納超過1個字節。 – jamylak

-1

def read(afilename): 
    with open("afilename", "r") as file 
     lines = file.readlines() 
     lines.replace("[", "{") 
     #place reset of code here in

來源

2015-06-22 04:04:39 AuzPython

'lines = file.readlines（）'已經殺死內存 – jamylak

，這取決於正在讀取/寫入的文件的大小。在一個小文件上，你能說你會注意到嗎？ – AuzPython

在一個小文件上你不會注意到。但問題是關於「打開大文件」，並說「600mb文件」。另外它的壞習慣使用'.readlines（）'，我從不使用它 – jamylak

BLOCK_SIZE = 1 << 15 
with open(input_file, 'rb') as fin, open(output_file, 'wb') as fout: 
    for block in iter(lambda: fin.read(BLOCK_SIZE), b''): 
     # do your replace 
     fout.write(block)

來源

2015-06-22 04:23:25 LittleQ

Python - 打開並更改大型文本文件

回答

相關問題