2015-08-17 88 views
1

我有一個csv文件,我只想提取句子的時間戳,其中包含toward加上該句子中的水果名稱。我怎樣才能做到這一點R(或是否有這樣做的一個更快的方法,那是什麼?)R中的數據清理

1438293900729698553,robot is in motion toward [strawberry] 
1438293900730571638,Found a plan for avocado in 1.36400008202 seconds 
1438293900731434815,current probability is greater than EXECUTION_THRESHOLD 
1438293900731554567,ready to execute am original plan of len = 33 
1438293900731586463,len of sub plan 1 = 24 
1438293900731633713,len of sub plan 2 = 9 
1438293900732910799,put in an execution request; now updating the dict 
1438293900732949576,current_prediciton_item = avocado 
1438293900733070339,current_item_probability = 0.880086981207 
1438293901677787230,current probability is greater than PLANNING_THRESHOLD 
1438293901681590725,robot is in motion toward [avocado] 
1438293902689233770,we have received verbal request [avocado] 
1438293902689314002,we already have a plan for the verbal request 
1438293902689377800,debug 
1438293902690529516,put in the final motion request 
1438293902691076051,Found a plan for avocado in 1.95595788956 seconds 
1438293902691084147,current predicted item != motion target; calc a new plan 
1438293902691110642,current probability is greater than EXECUTION_THRESHOLD 
1438293902691885974,have existing requests 
1438293904496769068,robot is in motion toward [avocado] 
1438293907737142498,ready to pick up the item 

理想我所要的輸出是這樣的:

1438293900729698553, strawberry 
1438293901681590725, avocado 
1438293904496769068, avocado 
+3

你可以用grep函數來做,但這是一個純粹的編碼問題,與統計無關。在interweb中查找「使用grep進行子集」以獲取答案。 – Aksakal

+0

什麼是您的操作系統? –

+0

您的意思是操作系統?這是Mac OSX優勝美地 –

回答

2

給這嘗試,其中filename是您的文件的名稱。

g <- grep("toward", readLines(filename), fixed = TRUE, value = TRUE) 
gsub("((?<=,).*\\[)|\\]", "", g, perl = TRUE) 
# [1] "1438293900729698553,strawberry" "1438293901681590725,avocado" 
# [3] "1438293904496769068,avocado" 
+0

如果你只想得到沒有水果名稱的時間戳,你可以在'gsub'行的輸出上用逗號'strsplit'。 – drammock