如何在python中讀取6GB日誌文件，而不是先將整個文件加載到內存中？

-2

我想在緩衝區中讀取一個大日誌文件（6GB），我的意思是讀取100MB然後睡幾秒鐘，並且我想阻止在內存中加載文件內容，我想讀它像head -nx在bash，也該文件包括塊，每個塊包含許多線，並且每個塊之間有3空白行，例如：如何在python中讀取6GB日誌文件，而不是先將整個文件加載到內存中？

[18/05/2015:00:00:00 +0300]%PARSER_ERROR[elapsedTime] 
GET /mobile/ HTTP/1.1 
host: www.my-host.com:8082 
accept: */* 
accept-language: en-gb 
connection: keep-alive 
accept-encoding: gzip, deflate 
user-agent: Mozilla/5.0 (iPhone; CPU iPhone OS 8_2 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Mobile/12D508 
x-sub-imsi: 418876678 
x-sub-msisdn: 333123654 



[18/05/2015:00:00:00 +0300]%PARSER_ERROR[elapsedTime] 
GET/HTTP/1.1 
content-type: application/x-www-form-urlencoded 
user-agent: Dalvik/1.6.0 (Linux; U; Android 4.4.2; AirPhoneS6 Build/KOT49H) 
host: www.my-host.net 
connection: Keep-Alive 
accept-encoding: gzip 
x-sub-imsi: 418252632 
x-sub-msisdn: 333367627836 



HTTP/1.1 302 Found 
Location: http://www.my-host.net/welcome/main.html 
Set-Cookie: oam.Flash.RENDERMAP.TOKEN=-jdrkoipfe; Path=/ 



[18/05/2015:00:00:00 +0300]%PARSER_ERROR[elapsedTime] 
GET/HTTP/1.1 
content-type: application/x-www-form-urlencoded 
user-agent: Dalvik/1.6.0 (Linux; U; Android 4.4.2; AirPhoneS6 Build/KOT49H) 
host: www.my-host.net 
connection: Keep-Alive 
accept-encoding: gzip 
x-sub-imsi: 41887237832 
x-sub-msisdn: 333878778

我要導出用戶代理和其MSISDN和平臺版本到csv文件，所以我要生成2個文件，ios.cs和android.csv，並且每個文件將包含uniq msisdn 該文件將如下所示： user-agent，version，msisdn 示例： Android，4.2 .2，333878778

因此，我必須逐塊檢查，然後檢查用戶代理行，然後檢查其msisdn。我試過它在bash中做，但因爲bash沒那麼靈活，所以我決定在Python中執行它

來源

2015-06-16 mosleh

見http://stackoverflow.com/editing-help與自己的帖子 –

確定的格式幫助，讓我們看看你的Python。什麼不行？ – SiHa

你可以使用fileinput庫提供一個迭代器，所以我不認爲它會加載整個將文件存入內存，除非你讓它做到這一點。

import fileinput 
import time 

file = fileinput.input('my_log_file.txt') 

for line in file: 
    # do your computation 
    time.sleep(5)

來源

2015-06-16 11:46:20 hspandher

但這不是有效的方式，通過這個你將加載整個6GB文件到內存中！我正在尋找的是一種有效的方式，也防止加載整個文件到內存中讀取，像這樣，但這不適用於我： BUFFER = int（10E6）＃10兆字節緩衝區 file =開放（ 'somefile.txt'， 'R'）文本= file.readlines（BUFFER）而文本= []：對於t在文本：＃操作文本= file.readlines（BUFFER） – mosleh

對不起，我我猜錯了，我改變了我的答案。這一個應該工作。 – hspandher

-1

def readFile(inputFile): 
    file_object = open(inputFile, 'rb') 
    buff = int(1E6) #100 Megabyte 
    while True: 
     block = file_object.read(buff) 
     if not buff: time.sleep(3) 
     doSomeThing(block) 
     block = file_object.read(buff) 
    file_object.close() 


# time python readfile.py

來源

2015-06-22 07:34:39 mosleh

如何在python中讀取6GB日誌文件，而不是先將整個文件加載到內存中？

回答

相關問題