2012-10-05 97 views
0

我嘗試使用Python創建日誌文件過濾器像Python的過濾器的日誌文件

Thu Oct 4 23:14:40 2012 [pid 16901] CONNECT: Client "66.249.74.228" 
Thu Oct 4 23:14:40 2012 [pid 16900] [ftp] OK LOGIN: Client "66.249.74.228", anon  password "[email protected]" 
Thu Oct 4 23:17:42 2012 [pid 16902] [ftp] FAIL DOWNLOAD: Client "66.249.74.228", "/pub/10.5524/100001_101000/100039/Assembly-2011/Pa9a_assembly_config4.scafSeq.gz", 14811136 bytes, 79.99Kbyte/sec 
Fri Oct 5 00:04:13 2012 [pid 25809] CONNECT: Client "66.249.74.228" 
Fri Oct 5 00:04:14 2012 [pid 25808] [ftp] OK LOGIN: Client "66.249.74.228", anon password "[email protected]" 
Fri Oct 5 00:07:16 2012 [pid 25810] [ftp] FAIL DOWNLOAD: Client "66.249.74.228", "/pub/10.5524/100001_101000/100027/Raw_data/PHOlcpDABDWABPE/090715_I80_FC427DJAAXX_L8_PHOlcpDABDWABPE_1.fq.gz", 14811136 bytes, 79.99Kbyte/sec 
Fri Oct 5 00:13:19 2012 [pid 27354] CONNECT: Client "1.202.186.53" 
Fri Oct 5 00:13:19 2012 [pid 27353] [ftp] OK LOGIN: Client "1.202.186.53", anon password "[email protected]" 
Fri Oct 5 00:13:33 2012 [pid 27355] [ftp] FAIL DOWNLOAD: Client "1.202.186.53", "/pub", 0.00Kbyte/sec 
Fri Oct 5 00:26:04 2012 [pid 341] [ftp] OK DOWNLOAD: Client "210.72.156.68", "/pub/10.5524/100001_101000/100030/RNA-Seq/Mgo_2.fq.gz", 1985229528 bytes, 85.87Kbyte/sec 
Fri Oct 5 00:55:45 2012 [pid 2766] CONNECT: Client "157.82.250.217" 
Fri Oct 5 00:55:45 2012 [pid 2765] [ftp] OK LOGIN: Client "157.82.250.217", anon password "[email protected]" 
Fri Oct 5 00:56:05 2012 [pid 2767] [ftp] FAIL DOWNLOAD: Client "157.82.250.217", "/pub/10.5524/100001_101000/100036/Gene_catalogue/Gene_catalogue.pep", 1638400 bytes, 81.81Kbyte/sec 
Fri Oct 5 00:57:27 2012 [pid 3056] CONNECT: Client "157.82.250.217" 
Fri Oct 5 00:57:27 2012 [pid 3055] [ftp] OK LOGIN: Client "157.82.250.217", anon password "[email protected]" 

日誌文件有一些機器人訪問記錄,所以如何使用Python的過濾器,實現真正的人的訪問記錄。 我已經建立了一個過濾器來獲取每週記錄,所以你可以幫我添加它。

import time 
f= open("/opt/CLiMB/Storage1/log/vsftp.log") 
def OnlyRecent(line): 
    if time.strptime(line.split("[")[0].strip(),"%a %b %d %H:%M:%S %Y")> time.gmtime(time.time()-(60*60*24*7)): 
     return True 
    return False 
filename= time.strftime('%Y%m%d')+'.log' 
f1= open(filename,'w') 
for line in f: 
    if OnlyRecent(line): 
      print line 
      f1.write(line) 
f.close() 
f1.close() 

回答

0

如果你確定客戶端,通過觀察他的密碼使用你的系統,實際上,機器人([email protected]看起來像一個實際的機器人),那麼你可以將一個字符串分解並查看第二部分是否包含機器人電子郵件:

# Add additional robot e-mails here 
robot_emails = ["[email protected]"] 

def isRobotRecord(line): 

    for email in robot_emails: 
     if email in line.split("Client")[1]: 
      return True 
     else: 
      return False 
+0

電子郵件代表:

當羣體性事件,那麼對於每個組必須在此事件是否是「簽到」通過像機器人剩下什麼? – AntiGMO

+0

如何在腳本中添加此功能。在f中添加行:if OnlyRecent(line):if isRobotRecord(line):then print line f1.write(line)? – AntiGMO

+0

@JesseSiu,no,'如果OnlyRecent(line):if not isRobotRecord(line):print line',或者可以寫得更短一點:'if OnlyRecent(line)and notRobotRecord(line):'(both條件在一行)。 – aga

0

您可以通過某些標識符對事件進行分組。我想過pid,但看起來日誌中的所有行都有不同的pid。您可以爲每個組使用IP地址,並在找到CONNECT: Client "[IP]"時啓動新組,但如果某些IP地址客戶端一次有多個會話,則此操作將失敗。如果沒有會話標識符,很難決定哪一行用作一個會話(組)。 "anon password "[email protected]"