1
兩個web服務器Apache和Nginx都可以爲訪問者提供uniqid cookie和mod_unique_id/userid模塊。這樣一個cookie looks like四個uint32值編碼爲base64字符串。第二個字節是cookie發佈時的時間戳。從apache/niginx用戶標識cookie提取日期
我想提取它的日期和時間。
from base64 import b64decode
from datetime import datetime
import shlex, gzip, glob
from struct import unpack
import pandas as pd
import numpy as np
def get_data() -> pd.DataFrame:
filenames = glob.glob('data/user_cookie/stat-*.gz')
for filename in filenames:
print(filename)
f = gzip.open(filename, 'rt')
for row in f.readlines():
parts = shlex.split(row)
useragent, raw_cookie = parts[9], parts[16]
if raw_cookie == '-':
raw_visit_date = parts[3][1:]
# this is a first visit
visit_date = datetime.strptime(raw_visit_date,
'%d/%b/%Y:%H:%M:%S')
else:
visit_date = datetime.fromtimestamp(unpack('IIII',
b64decode(raw_cookie))[1])
print(useragent, visit_date)
if __name__ == '__main__':
get_data()
這條線在我看來特別「人造」。如何使所有的代碼更「pythonic」和更快?
datetime.fromtimestamp(unpack('IIII', b64decode(raw_cookie))[1])