2015-08-17 62 views
0

我有一個csv文件,我只想提取句子的時間戳,其中包含加上該句子中的水果名稱。我怎樣才能做到這一點R(或是否有這樣做的一個更快的方法,那是什麼?)R中的數據清理

rosbagTimestamp,data 
1438293900729698553,robot is in motion toward [strawberry] 
1438293900730571638,Found a plan for avocado in 1.36400008202 seconds 
1438293900731434815,current probability is greater than EXECUTION_THRESHOLD 
1438293900731554567,ready to execute am original plan of len = 33 
1438293900731586463,len of sub plan 1 = 24 
1438293900731633713,len of sub plan 2 = 9 
1438293900732910799,put in an execution request; now updating the dict 
1438293900732949576,current_prediciton_item = avocado 
1438293900733070339,current_item_probability = 0.880086981207 
1438293901677787230,current probability is greater than PLANNING_THRESHOLD 
1438293901681590725,robot is in motion toward [avocado] 
1438293902689233770,we have received verbal request [avocado] 
1438293902689314002,we already have a plan for the verbal request 
1438293902689377800,debug 
1438293902690529516,put in the final motion request 
1438293902691076051,Found a plan for avocado in 1.95595788956 seconds 
1438293902691084147,current predicted item != motion target; calc a new plan 
1438293902691110642,current probability is greater than EXECUTION_THRESHOLD 
1438293902691885974,have existing requests 
1438293904496769068,robot is in motion toward [avocado] 
1438293907737142498,ready to pick up the item 

理想我所要的輸出是這樣的:

1438293900729698553, strawberry 
1438293901681590725, avocado 
1438293904496769068, avocado 

因此很明顯,我必須使用subsetgrepR,但我不太確定如何!

+1

你有一個向量或其他對象,包含你想讓你的代碼識別的水果的名字? – ulfelder

+0

@ulfelder我沒有矢量,但是,我知道他們只有15個,所以他們是有限的。這將如何幫助? –

+1

如果您要讓代碼在這些字符串中查找某些內容,您需要告訴它該查找什麼。除非水果的名字總是在'往'字後面出現,或者是方括號中唯一出現的字,否則你應該能夠避開這個問題。這些條件中至少有一個是真的嗎? – ulfelder

回答

5
stamps <- df$rosbagTimestamp[grep("toward \\[", df$data)] 
fruits <- gsub(".*\\[(\\w+)\\].*", "\\1", df$data[grep("toward \\[", df$data)]) 
data.frame(stamps,fruits) 
       stamps  fruits 
1 1438293900729698560 strawberry 
2 1438293901681590784 avocado 
3 1438293904496769024 avocado 

我用模式"toward \\["來定位水果。如果變化發生任何變化,可以延長。 stamps變量是通過查找具有數據列中的模式的時間戳創建的。變量分離了括號內的水果。