2015-09-13 118 views
2

我無法改變的遺留系統每天抽出5千兆大部分糟糕的XML日誌並且吹掉我的攝取許可證。 每分鐘發生1000次以上的詳細錯誤有兩類,但每隔幾分鐘就有一次真正有趣的輸入。 我想大幅度縮短SED的重複條目,並保留有趣的不變XML日誌文件正則表達式

所以我需要什麼
1的正則表達式匹配各2班煩人的日誌條目(如...」的十進制'...和...'DBNull'...但不偶爾有趣的)。
一個正則表達式匹配每個惱人的錯誤類是很好,我可以做2個SED通過
2.我需要一個捕獲組與時間戳,所以我可以更換一個簡潔版的長XML行 - 但正確時間戳,以免丟失保真度。

我已經得到儘可能此匹配和捕獲創建日期:

(?:<Log).*?(createdDate="\d{2}\/\d{2}\/\d{4}.\d{2}:\d{2}:\d{2}").*?(?:decimal).*?(<\/Log>) 

這是接近,但是從一種逆向貪婪的,我匹配從「小數」到遭遇開口日誌聲明的幾個條目前面 發揮各地的負向後看只是給自己一個嚴重的頭痛

樣本數據

<Log type="ERROR" createdDate="11/09/2015 08:13:14" > 
<![CDATA[ [108] -- much cruft removed-- SerializationException: There was an error deserializing the object of type Common.DataCtract.QResult. The value '' cannot be parsed as the type 'decimal'. ---> System.Xml.XmlException: The value '' cannot be parsed as the type 'decimal'. ---> System.FormatException: Input string was not in a correct format. 
    ]]></Log> 

<Log type="ERROR" createdDate="11/09/2015 08:13:13" > 
<![CDATA[ [108] -- much cruft removed-- SerializationException: There was an error deserializing the object of type Common.DataCtract.QResult. The value '' cannot be parsed as the type 'decimal'. ---> System.Xml.XmlException: The value '' cannot be parsed as the type 'decimal'. ---> System.FormatException: Input string was not in a correct format. 
    ]]></Log> 

<Log type="ERROR" createdDate="11/09/2015 08:13:12" > 
<![CDATA[ [129] Services.DService.D.FailedToAddRQ(Exceptionex, RQEntityrQ, RHeaderEntityrHeader, StringPRef,): FailedToAddRQ()...with parameters [pRef:=123,0,1], [rQ.AffinityCode:=],[Q.thing=thing][rQ.AffinityRQDT:=123],[rHeader.RHeaderIDPK:=123],[rQ.UWriteIDFK:=] 
    Data.DataAccessLayerException: Conversion from type 'DBNull' to type 'Long' is not valid. 
Parameters: 
[RETURN_VALUE][ReturnValue] Value: [0] 
---> System.InvalidCastException: Conversion from type 'DBNull' to type 'Long' is not valid. 
]]></Log> 

<Log type="ERROR" createdDate="11/09/2015 08:13:11" > 
<![CDATA[ [129] Services.DService.D.FailedToAddRQ(Exceptionex, RQEntityrQ, RHeaderEntityrHeader, StringPRef,): FailedToAddRQ()...with parameters [pRef:=123,0,1], [rQ.AffinityCode:=],[Q.thing=thing][rQ.AffinityRQDT:=123],[rHeader.RHeaderIDPK:=123],[rQ.UWriteIDFK:=] 
    Data.DataAccessLayerException: Conversion from type 'DBNull' to type 'Long' is not valid. 
    ]]></Log> 

<Log type="ERROR" createdDate="11/09/2015 08:13:10" > 
<![CDATA[ [231] An actual interesting log entry with a real error message ]]></Log> 

<Log type="ERROR" createdDate="11/09/2015 08:13:09" > 
<![CDATA[ [108] -- much cruft removed-- SerializationException: There was an error deserializing the object of type Common.DataCtract.QResult. The value '' cannot be parsed as the type 'decimal'. ---> System.Xml.XmlException: The value '' cannot be parsed as the type 'decimal'. ---> System.FormatException: Input string was not in a correct format. 
    ]]></Log> 

回答

0

不知道你是exaclty尋找,但是這是一個如何隔離<Log...</Log>塊,並繼續到更換一個例子:

/^<Log/{ # condition: a line that starts with "<Log " 
    :a; # define the label "a" 
    /<\/Log>/! { # condition: if the line doesn't contain "</Log>" 
     N;  # append the next line to the pattern space 
     ba;  # go to the label "a" 
    }; 
    s/>.*\(decimal\|DBNull\).*</>\1</ # replace the block 
} 

(我假定<Log是:

sed '/^<Log /{:a;/<\/Log>/!{N;ba;};s/>.*\(decimal\|DBNull\).*</>\1</}' file.log 

細節總是在行的開頭,不像第10和11部分的記錄那樣,可能是錯別字)

+0

完美謝謝Casimir - 您對行開頭的日誌文件是正確的。基於sed的解決方案,而不是純粹的正則表達式,並不完全符合我的期望 - 但非常有見地,而且絕對是要走的路 –