多次匹配時清理'stringr str_replace_all'自動連接

我用police_officer <- str_extract_all(txtparts, "ID:.*\n")從文本文件中提取參與911調用的所有警員的姓名。例如：
2237 DISTURBANCE Report taken Call Taker: Telephone Operators Sharon L Moran Location/Address: [BRO 6949] 61 WILSON ST ID: Patrolman Darvin Anderson Disp-22:43:39 Arvd-22:48:57 Clrd-23:49:45 ID: Patrolman Stephen T Pina Disp-22:43:48 Clrd-22:46:10 ID: Sergeant Michael V Damiano Disp-22:46:33 Arvd-22:47:14 Clrd-22:55:22多次匹配時清理'stringr str_replace_all'自動連接

在一些地方，當它匹配多個ID:我得到："c(\" Patrolman Darvin Anderson\\n\", \" Patrolman Stephen T Pina\\n\", \" Sergeant Michael V Damiano\\n\")"。這是我迄今試圖清理數據：
police_officer <- str_replace_all(police_officer,"c\\(.","") police_officer <- str_replace_all(police_officer,"\\)","") police_officer <- str_replace_all(police_officer,"ID:","") police_officer <- str_replace_all(police_officer,"\\n\","") # I can't get rid of\\n\.

這是我結束了
" Patrolman Darvin Anderson\\n\", \" Patrolman Stephen T Pina\\n\", \" Sergeant Michael V Damiano\\n\""

我需要幫助清理\\n\。

來源

2016-03-03 Jomisilfe

您可以使用下面的正則表達式與str_match_all：

\bID:\s*(\w+(?:\h+\w+)*)

見regex demo

> txt <- "Call Taker: Telephone Operators Sharon L Moran\n Location/Address: [BRO 6949] 61 WILSON ST\n    ID: Patrolman Darvin Anderson\n      Disp-22:43:39     Arvd-22:48:57 Clrd-23:49:45\n    ID: Patrolman Stephen T Pina\n      Disp-22:43:48        Clrd-22:46:10\n    ID: Sergeant Michael V Damiano\n      Disp-22:46:33     Arvd-22:47:14 Clrd-22:55:22" 
> str_match_all(txt, "\\bID:\\s*(\\w+(?:\\h+\\w+)*)") 
[[1]] 
    [,1]        [,2]       
[1,] "ID: Patrolman Darvin Anderson" "Patrolman Darvin Anderson" 
[2,] "ID: Patrolman Stephen T Pina" "Patrolman Stephen T Pina" 
[3,] "ID: Sergeant Michael V Damiano" "Sergeant Michael V Damiano"

正則表達式匹配ID:作爲一個整體的話，那麼匹配零個或多個空白（與\s*）和那麼會捕獲字母數字字符序列（可選地用水平空格分隔）。 str_match_all有助於提取捕獲的部分，因此，您不能使用str_extract_all與此正則表達式。

更新：

> time <- str_trim(str_extract(txt, " [[:digit:]]{4}")) 
> Call_taker <- str_replace_all(str_extract(txt, "Call Taker:.*\n"),"Call Taker:","") %>% str_replace_all("\n","") 
> address <- str_extract(txt, "Location/Address:.*\n") 
> Police_officer <- str_match_all(txt, "\\bID:\\s*(\\w+(?:\\h+\\w+)*)") 
> BPD_log <- cbind(time,Call_taker,address,list(Police_officer[[1]][,2])) 
> BPD_log <- as.data.frame(BPD_log) 
> colnames(BPD_log) <- c("time", "Call_taker", "address", "Police_officer") 
> BPD_log 
    time        Call_taker          address 
1 6949  Telephone Operators Sharon L Moran Location/Address: [BRO 6949] 61 WILSON ST\n 
                    Police_officer 
1 Patrolman Darvin Anderson, Patrolman Stephen T Pina, Sergeant Michael V Damiano 
>

來源

2016-03-03 15:28:06

謝謝！我猜想真正的問題是，當我用'Call_taker'，'time'，'address'和'Police_officer'將所有內容帶入數據框時。 'time < - str_trim（str_extract（txt，「[[：digit：]] {4}」）） Call_taker < - str_replace_all（str_extract（txt，「Call Taker：。* \ n」）「Call Taker：「」，「」）％>％str_replace_all（「\ n」，「」） address < - str_extract（txt，「Location/Address：。* \ n」） Police_officer < - str_match_all（txt，「\\ bID： \\ s *（\\ w +（？：\\ h + \\ w +）*）「） BPD_log < - cbind（time，Call_taker，address，Police_officer） BPD_log < - as.data.frame（BPD_log）'we仍然會得到'c（'當我們帶上Police_officer – Jomisilfe

我不知道你的最終數據框應該是什麼樣子，但是你只需要''，2]'尺寸就可以從'str_match_all'加入整個輸出。 'BPD_log < - cbind（time，Call_taker，address，Police_officer [[1]] [，2]）'。 –

剛剛看到您的更新，但我希望將數據呈現在一行下，意味着所有的警察應該在一個牢房裏。如果你能做到這一點，那會很棒。 – Jomisilfe

多次匹配時清理'stringr str_replace_all'自動連接

回答

相關問題