2016-03-03 46 views
1

我用police_officer <- str_extract_all(txtparts, "ID:.*\n")從文本文件中提取參與911調用的所有警員的姓名。 例如:
2237 DISTURBANCE Report taken
Call Taker: Telephone Operators Sharon L Moran Location/Address: [BRO 6949] 61 WILSON ST ID: Patrolman Darvin Anderson Disp-22:43:39 Arvd-22:48:57 Clrd-23:49:45 ID: Patrolman Stephen T Pina Disp-22:43:48 Clrd-22:46:10 ID: Sergeant Michael V Damiano Disp-22:46:33 Arvd-22:47:14 Clrd-22:55:22
多次匹配時清理'stringr str_replace_all'自動連接

在一些地方,當它匹配多個ID:我得到:"c(\" Patrolman Darvin Anderson\\n\", \" Patrolman Stephen T Pina\\n\", \" Sergeant Michael V Damiano\\n\")"。 這是我迄今試圖清理數據:
police_officer <- str_replace_all(police_officer,"c\\(.","") police_officer <- str_replace_all(police_officer,"\\)","") police_officer <- str_replace_all(police_officer,"ID:","") police_officer <- str_replace_all(police_officer,"\\n\","") # I can't get rid of\\n\.

這是我結束了
" Patrolman Darvin Anderson\\n\", \" Patrolman Stephen T Pina\\n\", \" Sergeant Michael V Damiano\\n\""

我需要幫助清理\\n\

回答

1

您可以使用下面的正則表達式與str_match_all

\bID:\s*(\w+(?:\h+\w+)*) 

regex demo

> txt <- "Call Taker: Telephone Operators Sharon L Moran\n Location/Address: [BRO 6949] 61 WILSON ST\n    ID: Patrolman Darvin Anderson\n      Disp-22:43:39     Arvd-22:48:57 Clrd-23:49:45\n    ID: Patrolman Stephen T Pina\n      Disp-22:43:48        Clrd-22:46:10\n    ID: Sergeant Michael V Damiano\n      Disp-22:46:33     Arvd-22:47:14 Clrd-22:55:22" 
> str_match_all(txt, "\\bID:\\s*(\\w+(?:\\h+\\w+)*)") 
[[1]] 
    [,1]        [,2]       
[1,] "ID: Patrolman Darvin Anderson" "Patrolman Darvin Anderson" 
[2,] "ID: Patrolman Stephen T Pina" "Patrolman Stephen T Pina" 
[3,] "ID: Sergeant Michael V Damiano" "Sergeant Michael V Damiano" 

正則表達式匹配ID:作爲一個整體的話,那麼匹配零個或多個空白(與\s*)和那麼會捕獲字母數字字符序列(可選地用水平空格分隔)。 str_match_all有助於提取捕獲的部分,因此,您不能使用str_extract_all與此正則表達式。

更新:

> time <- str_trim(str_extract(txt, " [[:digit:]]{4}")) 
> Call_taker <- str_replace_all(str_extract(txt, "Call Taker:.*\n"),"Call Taker:","") %>% str_replace_all("\n","") 
> address <- str_extract(txt, "Location/Address:.*\n") 
> Police_officer <- str_match_all(txt, "\\bID:\\s*(\\w+(?:\\h+\\w+)*)") 
> BPD_log <- cbind(time,Call_taker,address,list(Police_officer[[1]][,2])) 
> BPD_log <- as.data.frame(BPD_log) 
> colnames(BPD_log) <- c("time", "Call_taker", "address", "Police_officer") 
> BPD_log 
    time        Call_taker          address 
1 6949  Telephone Operators Sharon L Moran Location/Address: [BRO 6949] 61 WILSON ST\n 
                    Police_officer 
1 Patrolman Darvin Anderson, Patrolman Stephen T Pina, Sergeant Michael V Damiano 
> 
+0

謝謝!我猜想真正的問題是,當我用'Call_taker','time','address'和'Police_officer'將所有內容帶入數據框時。 'time < - str_trim(str_extract(txt,「[[:digit:]] {4}」)) Call_taker < - str_replace_all(str_extract(txt,「Call Taker:。* \ n」)「Call Taker: 「」,「」)%>%str_replace_all(「\ n」,「」) address < - str_extract(txt,「Location/Address:。* \ n」) Police_officer < - str_match_all(txt,「\\ bID: \\ s *(\\ w +(?:\\ h + \\ w +)*)「) BPD_log < - cbind(time,Call_taker,address,Police_officer) BPD_log < - as.data.frame(BPD_log)'we仍然會得到'c('當我們帶上Police_officer – Jomisilfe

+0

我不知道你的最終數據框應該是什麼樣子,但是你只需要'',2]'尺寸就可以從'str_match_all'加入整個輸出。 'BPD_log < - cbind(time,Call_taker,address,Police_officer [[1]] [,2])'。 –

+0

剛剛看到您的更新,但我希望將數據呈現在一行下,意味着所有的警察應該在一個牢房裏。如果你能做到這一點,那會很棒。 – Jomisilfe