2016-07-26 43 views
0

我想將日誌事件加載到data.table,每個日誌由timestamp標識,並且一些日誌可以包含許多行。將具有特定分隔符的文本導入r

我有以下.txt文件:

2016-07-19 00:00:01,421 WARNING Exception happened while transfering for command 
           at java.lang.NumberFormatException 
           at java.lang.Integer.parseInt 
           at java.util.concurrent.Task 

2016-07-19 00:01:01,525 DEBUG Upload all environments 
2016-07-19 00:01:01,720 DEBUG Upload all environments 
2016-07-19 00:02:00,520 WARNING Excpetion happened while transfering for command 
           at java.lang.NumberFormatException 

我希望得到以下data.table

 log 
1 2016-07-19 00:00:01,421 WARNING Exception happened while transfering for command at java.lang.NumberFormatException at java.lang.Integer.parseInt at java.util.concurrent.Task 
2 2016-07-19 00:01:01,525 DEBUG Upload all environments 
3 2016-07-19 00:01:01,720 DEBUG Upload all environments 
4 2016-07-19 00:02:00,520 WARNING Excpetion happened while transfering for command at java.lang.NumberFormatException 

我想每個日誌事件上傳到一個單一的線。我試圖用\n分隔符:

docs <- read.table("log2.txt",header=FALSE,sep="\n",col.names="log",nrows=1000) 
+0

嘗試使用'readLines'代替。 – lmo

回答

0

使用readLines,然後內data.table結合行:

require(data.table) 

raw = data.table(s = readLines('log.txt')) 
raw = raw[s != ''] 
raw[, s := stringr::str_trim(s)] 
raw[, idx := cumsum(s %like% '^[0-9]{4}')] 
raw[, list(s = paste(s, collapse = ' ')), by = idx] 

編輯:改變今年的正則表達式,感謝您的評論

+0

這隻適用於2016年。像''[[0-9] {4}'''會更好,我猜 – Rentrop

+0

是的,當然,但你明白了:) – sbstn

+0

我已經得到每一行作爲我的'data.table'的行,我應該通過'idx'值來分組嗎? –

相關問題