我想在緩衝區中讀取一個大日誌文件(6GB),我的意思是讀取100MB然後睡幾秒鐘,並且我想阻止在內存中加載文件內容,我想讀它像head -nx在bash,也該文件包括塊,每個塊包含許多線,並且每個塊之間有3空白行,例如:如何在python中讀取6GB日誌文件,而不是先將整個文件加載到內存中?
[18/05/2015:00:00:00 +0300]%PARSER_ERROR[elapsedTime]
GET /mobile/ HTTP/1.1
host: www.my-host.com:8082
accept: */*
accept-language: en-gb
connection: keep-alive
accept-encoding: gzip, deflate
user-agent: Mozilla/5.0 (iPhone; CPU iPhone OS 8_2 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Mobile/12D508
x-sub-imsi: 418876678
x-sub-msisdn: 333123654
[18/05/2015:00:00:00 +0300]%PARSER_ERROR[elapsedTime]
GET/HTTP/1.1
content-type: application/x-www-form-urlencoded
user-agent: Dalvik/1.6.0 (Linux; U; Android 4.4.2; AirPhoneS6 Build/KOT49H)
host: www.my-host.net
connection: Keep-Alive
accept-encoding: gzip
x-sub-imsi: 418252632
x-sub-msisdn: 333367627836
HTTP/1.1 302 Found
Location: http://www.my-host.net/welcome/main.html
Set-Cookie: oam.Flash.RENDERMAP.TOKEN=-jdrkoipfe; Path=/
[18/05/2015:00:00:00 +0300]%PARSER_ERROR[elapsedTime]
GET/HTTP/1.1
content-type: application/x-www-form-urlencoded
user-agent: Dalvik/1.6.0 (Linux; U; Android 4.4.2; AirPhoneS6 Build/KOT49H)
host: www.my-host.net
connection: Keep-Alive
accept-encoding: gzip
x-sub-imsi: 41887237832
x-sub-msisdn: 333878778
我要導出用戶代理和其MSISDN和平臺版本到csv文件,所以我要生成2個文件,ios.cs和android.csv,並且每個文件將包含uniq msisdn 該文件將如下所示: user-agent,version,msisdn 示例: Android,4.2 .2,333878778
因此,我必須逐塊檢查,然後檢查用戶代理行,然後檢查其msisdn。我試過它在bash中做,但因爲bash沒那麼靈活,所以我決定在Python中執行它
見http://stackoverflow.com/editing-help與自己的帖子 –
確定的格式幫助,讓我們看看你的Python。什麼不行? – SiHa