我想解析由fidonet郵件binkd產生的日誌文件,這是多和更糟糕 - 混合:幾個實例可以寫入日誌文件一個,例如:Hadoop的多混合記錄
27 Dec 16:52:40 [2484] BEGIN, binkd/1.0a-545/Linux -iq /tmp/binkd.conf
+ 27 Dec 16:52:40 [2484] session with 123.45.78.9 (123.45.78.9)
- 27 Dec 16:52:41 [2484] SYS BBSName
- 27 Dec 16:52:41 [2484] ZYZ First LastName
- 27 Dec 16:52:41 [2484] LOC City, Country
- 27 Dec 16:52:41 [2484] NDL 115200,TCP,BINKP
- 27 Dec 16:52:41 [2484] TIME Thu, 27 Dec 2012 21:53:22 +0600
- 27 Dec 16:52:41 [2484] VER binkd/0.9.6a-173/Win32 binkp/1.1
+ 27 Dec 16:52:43 [2484] addr: 2:1234/[email protected]
- 27 Dec 16:52:43 [2484] OPT NDA CRYPT
+ 27 Dec 16:52:43 [2484] Remote supports asymmetric ND mode
+ 27 Dec 16:52:43 [2484] Remote requests CRYPT mode
- 27 Dec 16:52:43 [2484] TRF 0 0
*+ 27 Dec 16:52:43 [1520] done (from 2:456/[email protected], OK, S/R: 0/0 (0/0 bytes))*
+ 27 Dec 16:52:43 [2484] Remote has 0b of mail and 0b of files for us
+ 27 Dec 16:52:43 [2484] pwd protected session (MD5)
- 27 Dec 16:52:43 [2484] session in CRYPT mode
+ 27 Dec 16:52:43 [2484] done (from 2:1234/[email protected], OK, S/R: 0/0 (0/0 bytes))
所以日誌文件不僅有多行,每行會有不可預知的行數,而且還有幾條記錄可以混在一起,就像會話1520已經在會話2484中間完成一樣。 在hadoop中解析這樣一個正確的方向是什麼文件?或者我應該只是逐行解析,然後將它們以某種方式合併到記錄中,然後使用另一組作業將這些記錄寫入SQL數據庫?
謝謝。