從文件中的行讀取多個子字符串

所以基本上我在做的是使用python腳本從apache error_log文件製作報告。什麼我處理的一個例子是：從文件中的行讀取多個子字符串

core:notice - SELinux policy enabled; httpd running as context system_u:system_r:httpd_t:s0 
suexec:notice - AH: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)

這是錯誤類型，然後尾隨文字：

[Wed Apr 13 18:33:42.521106 2016] [core:notice] [pid 11690] SELinux policy enabled; httpd running as context system_u:system_r:httpd_t:s0 
[Wed Apr 13 18:33:42.543989 2016] [suexec:notice] [pid 11690] AH: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)

，我試着讓看起來像最終的結果。然後我需要將這個格式化的文本寫入一個新文件。

我一直在嘗試使用正則表達式來做到這一點，但這已經有好幾年了，因爲我一直使用python，並且從未使用過正則表達式。到目前爲止，我所能得到的最多的是隔離第一個（日期）部分，但我無法弄清楚如何獲得隨後的括號包圍的子字符串和尾部文本。任何和所有的幫助將不勝感激！

來源

2016-04-14 zimty

你能發佈一些錯誤日誌的原始樣本行嗎？ – TheLazyScripter

我做到了，這就是第一塊。 [Wed Apr]等行來自日誌。 – zimty

由於您的數據包括正好四個字段和的顯示每場漂亮的方形支架，除了最後一個，你可以採取優勢，從這些行爲做你的任務，而不使用Regex這樣的：

texts = ['[Wed Apr 13 18:33:42.521106 2016] [core:notice] [pid 11690] SELinux policy enabled; httpd running as context system_u:system_r:httpd_t:s0', \ 
'[Wed Apr 13 18:33:42.543989 2016] [suexec:notice] [pid 11690] AH: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)'] 
for text in texts: 
    words = text.replace('[','').split(']') 
    newWords = words[1] + ' -' + words[3] 
    print(newWords)

，導致：

core:notice - SELinux policy enabled; httpd running as context system_u:system_r:httpd_t:s0 
suexec:notice - AH: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)

的想法是先更換一個起點平方支架與空字符串，並使用結束平方支架作爲參數來分割你的字（因此也將被刪除）：

words = text.replace('[','').split(']')

然後你只需要要形成新的string從場組合：

newWords = words[1] + ' -' + words[3]

和你做。

來源

2016-04-14 03:45:57 Ian

啊，好吧，這比我想要做的要容易得多！我也很欣賞這個解釋。 – zimty

@zimty是的，試着首先利用'string'的特性！ :)對於簡單的情況，正則表達式會過度殺毒，甚至可能會降低性能 – Ian

從文件中的行讀取多個子字符串

回答

相關問題