使用python，我們如何從代理日誌文件中刪除auth_user列？

我在一個文件夾中有數百個代理日誌文件，並希望從所有日誌文件中刪除auth_user列，並將它們輸出到另一個文件夾。使用python，我們如何從代理日誌文件中刪除auth_user列？

auth_user列用雙引號括起來。最大的問題是我不能使用空格字符作爲文本分隔符，因爲某些日誌文件在timestamp和auth_user之間沒有空格。我試圖用雙引號作爲文本分隔符，但是這會導致一些奇怪的結果，因爲有時雙引號之間沒有任何內容。

我到目前爲止有：

for src_name in glob.glob(os.path.join(source_dir, '*.log')): 
    base = os.path.basename(src_name) 
    dest_name = os.path.join(dest_dir,base) 
    with open(src_name, 'rb') as infile: 
     with open(dest_name, 'w') as outfile: 
      reader = csv.reader(infile, delimiter='"') 
      writer = csv.writer(outfile, delimiter='"') 
      for row in reader: 
       row[1] = '' 
       writer.writerow(row)

日誌文件如下（time_stamp"auth_user"src_ip）：

[21/Apr/2013:00:00:00 -0300]"cn=john smith,ou=central,ou=microsoft,o=com" 192.168.2.5 
[21/Apr/2013:00:00:01 -0400]"jsmith" 192.168.4.5 
[21/Apr/2013:00:00:01 -0400]"" 192.168.15.5 
[22/Apr/2013:00:00:01 -0400]"" 192.168.4.5 
[22/Apr/2013:00:00:01 -0400]"jkenndy" 192.168.14.5

我願意把它改變成這個（time_stampsrc_ip）：

[21/Apr/2013:00:00:00 -0300] 192.168.2.5 
[21/Apr/2013:00:00:01 -0400] 192.168.4.5 
[21/Apr/2013:00:00:01 -0400] 192.168.15.5 
[22/Apr/2013:00:00:01 -0400] 192.168.4.5 
[22/Apr/2013:00:00:01 -0400] 192.168.14.5

來源

2015-06-09 boy of summer

'auth_user'行是否沒有時間戳？ 'auth_user'後面的行中的空引號是什麼？你想僅在'username'行開始數據嗎？如果這兩行位於文件的開頭，則可以在從第三行開始讀取文件時忽略它們。 – albert

嗨，歡迎來到StackOverflow。 **請不要寫出大膽的問題，因爲感覺就像你在喊我們。** :) –

#time_stamp「auth_user」<-----第一行的更正 –

假設每個文件具有結構：

#[some timestamp here]"auth_user" 
#[21/Apr/2013:00:00:00 -0300]"" 
#[21/Apr/2013:00:00:00 -0300]"username" 
#[21/Apr/2013:00:00:00 -0300]"machine$" 
#[21/Apr/2013:00:00:00 -0300]"cn=john smith,ou=central,ou=microsoft,o=com" 
#[21/Apr/2013:00:00:01 -0400]"jsmith" 
#[21/Apr/2013:00:00:01 -0400]"" 
#[21/Apr/2013:00:00:01 -0400]""

假設前兩行需要跳過：

#!/usr/bin/env python3 
# coding: utf-8 

with open('file.log') as f: 
    for line_number, line in enumerate(f): 
     # line_number starts at zero, skip both lines at beginning of file 
     if line_number > 1: 
      # process file here, replace print statement with appropriate code 
      print(line)

來源

2015-06-09 21:12:21 albert

而不是使用CSV，可你只要打開這個文件通常，並使用正則表達式？下面將刪除AUTH_USER列不管是否有時間戳後面輸入一個空格，或是否有引號內的任何東西，或不：

import re 

with open('in.txt', 'r') as fh: 
    for line in fh: 
     line = re.sub(r'(?:(?<=\d{4}])|(?<=#time_stamp))\s*".*?"', '', line) 
     print(line)

輸入：

#time_stamp "auth_user" src_ip 
[21/Apr/2013:00:00:00 -0300]"cn=johnsmith,ou=central,ou=microsoft,o=com" 192.168.2.5 
[21/Apr/2013:00:00:01 -0400]"jsmith" 192.168.4.5 
[21/Apr/2013:00:00:01 -0400]"" 192.168.15.5 
[22/Apr/2013:00:00:01 -0400]"" 192.168.4.5 
[22/Apr/2013:00:00:01 -0400]"jkenndy" 192.168.14.5

輸出：

#time_stamp src_ip 
[21/Apr/2013:00:00:00 -0300] 192.168.2.5 
[21/Apr/2013:00:00:01 -0400] 192.168.4.5 
[21/Apr/2013:00:00:01 -0400] 192.168.15.5 
[22/Apr/2013:00:00:01 -0400] 192.168.4.5 
[22/Apr/2013:00:00:01 -0400] 192.168.14.5

來源

2015-06-09 21:20:27 stevieb

我會使用re正則表達式模塊將日誌文件的每一行分成三組，然後將第一組和第三組寫入ou輸入文件：

import glob 
import os 
import re 

pattern = re.compile(r'''(\[.+\])(".*")(.+)''') 

for src_name in glob.glob(os.path.join(source_dir, '*.log')): 
    base = os.path.basename(src_name) 
    dest_name = os.path.join(dest_dir, base) 
    with open(src_name, 'rt') as infile, open(dest_name, 'wt') as outfile: 
     for line in infile: 
      groups = pattern.search(line).groups() 
      outfile.write(groups[0]+groups[2]+'\n')

來源

2015-06-09 22:14:48 martineau

使用python，我們如何從代理日誌文件中刪除auth_user列？

回答

相關問題