2013-07-25 94 views
2

我導入了一些沒有列名的數據,所以現在我有超過一百萬行和一列(而不是5列)。在R中拆分一個字符串,不同的拆分參數元素

每一行的格式如下:

x <- "2012-10-19T16:59:01-07:00 192.101.136.140 <190>Oct 19 2012 23:59:01: %FWSM-6-305011: Built dynamic tcp translation from Inside:10.2.45.62/56455 to outside:192.101.136.224/9874" 

strsplit(x , split = c(" ", " ", "%", " ")) 

,並得到

[[1]] 
[1] "2012-10-19T16:59:01-07:00" "192.101.136.140"    
[3] "<190>Oct"      "19"       
[5] "2012"       "23:59:01:"     
[7] "%FWSM-6-305011:"    "Built"      
[9] "dynamic"      "tcp"       
[11] "translation"     "from"       
[13] "Inside:10.2.45.62/56455"  "to"       
[15] "outside:192.101.136.224/9874" 

我知道,它與循環分裂的說法做,但我似乎無法弄清楚如何獲得它是如何我想要的:

[[1]] 
    [1] "2012-10-19T16:59:01-07:00" "192.101.136.140"    
    [3] "<190>Oct 19 2012 23:59:01  "%FWSM-6-305011 
    [5] Built dynamic tcp translation from Inside:10.2.45.62/56455 to outside:192.101.136.224/9874" 

每一行都有不同的消息作爲第五個元素,但在第四個元素我只想將其餘的字符串保存在一起。

任何幫助,將不勝感激。

+0

你似乎認爲(誤)的拆分矢量項目按順序應用。 –

+0

這是真的。謝謝你清理它 – camelarms

回答

2

您可以使用pastecollapse參數組合每個從第五個元素開始的元素。

A <- strsplit(x = "2012-10-19T16:59:01-07:00 192.101.136.140 <190>Oct 19 2012 23:59:01: %FWSM-6-305011: Built dynamic tcp translation from Inside:10.2.45.62/56455 to outside:192.101.136.224/9874", split = c(" ", " ", "%", " ")) 

c(A[[1]][1:4], paste(A[[1]][5:length(A[[1]])], collapse=" ")) 

由於@DWin指出,split = c(" ", " ", "%", " ")不是爲了使用 - 換句話說,它等同於split = c(" ", "%")

+0

謝謝@Senor O,這幾乎成功了。最後一部分出來作爲' 「[3]」 <190>十月 「 [4] 」19「 [5]」 2012 23點59分01秒:%FWSM-6-305011:內置動態TCP翻譯來自內部:10.2。 45.62/56455外:192.101.136.224/9874" ' – camelarms

+1

這是因爲第一個7個項目將拆分字符串先於消息(不像你最初4表示) –

+0

請記住,'<190> 2012年10月19日23:59 :01:'會分成4個項目 –

0

我覺得這裏你不需要使用strsplit。我使用read.table來讀取使用text參數的行。然後您使用paste彙總列。由於您有很多行,因此最好在data.table內進行列聚合。

dt <- read.table(text=x) 
library(data.table) 
DT <- as.data.table(dt) 
DT[ , c('V3','V8') := list(paste(V3,V4,V5), 
     V8=paste(V8,V9,V10,V11,V12,V13,V14,V15))] 
DT[,paste0('V',c(1:3,6:7,8)),with=FALSE] 

         V1    V2    V3  V6    V7 
1: 2012-10-19T16:59:01-07:00 192.101.136.140 <190>Oct 19 2012 23:59:01: %FWSM-6-305011: 
                          V8 
1: Built dynamic tcp translation from Inside:10.2.45.62/56455 to outside:192.101.136.224/9874 
0

這裏是我認爲一個函數的方式工作,你以爲strsplit運作:

split.seq<-function(x,delimiters) { 
    break.point<-regexpr(delimiters[1], x) 
    first<-mapply(substring,x,1,break.point-1,USE.NAMES=FALSE) 
    second<-mapply(substring,x,break.point+1,nchar(x),USE.NAMES=FALSE) 
    if (length(delimiters)==1) return(lapply(1:length(first),function(x) c(first[x],second[x]))) 
    else mapply(function(x,y) c(x,y),first, split.seq(second, delimiters[-1]) ,USE.NAMES=FALSE, SIMPLIFY=FALSE) 
} 

split.seq(x,delimiters) 

測試:

x<-rep(x,2)    
delimiters=c(" ", " ", "%", " ") 
split.seq(x,delimiters) 

[[1]] 
[1] "2012-10-19T16:59:01-07:00"                 
[2] "192.101.136.140"                   
[3] "<190>Oct 19 2012 23:59:01: "                
[4] "FWSM-6-305011:"                    
[5] "Built dynamic tcp translation from Inside:10.2.45.62/56455 to outside:192.101.136.224/9874" 

[[2]] 
[1] "2012-10-19T16:59:01-07:00"                 
[2] "192.101.136.140"                   
[3] "<190>Oct 19 2012 23:59:01: "                
[4] "FWSM-6-305011:"                    
[5] "Built dynamic tcp translation from Inside:10.2.45.62/56455 to outside:192.101.136.224/9874"