2017-02-21 83 views
0

我正在尋找從電子郵件主題行提取患者ID。我正在處理兩個數據框:一個來自SQL數據庫(包含電子郵件主題行),另一個來自患者信息(醫院名稱和患者ID)。從電子郵件主題行提取特定的患者ID

我想要使用患者ID並從第一個數據框中清除主題行並返回與該患者相關的醫院。不幸的是,我無法提供訪問數據。

## Example Data 

Data frame 1 example row: 

Column 1 (from_Email): [email protected] 

Column 2 (Time_IN): 1/11/2000 12:00:00 

Column 3 (from_Subject): Patient H2445JFLD presented into ER with .... symptoms 

Data frame 2 example row: 

Column 1 (Hospital Name): Hospital ABC 

Column 2 (Patient ID): H2445JFLD 
+0

「不幸的是,我無法提供對數據的訪問。」不,但是您可以提供幾行**示例**數據,這些數據實際上反映了您將收到的數據類型,而實際上並不屬於數據集的一部分。例如,如果數據是追蹤大學生的成績(也受法律保護),那麼您可以提供描述約翰Q.納稅人和Jane Doe學業記錄的記錄。你也可以提供一個[mcve]來說明你已經嘗試了什麼,以及爲什麼這不起作用。 –

回答

1

既然你共享只有一行數據,我不能確定電子郵件主題行from_Subject的格局。如果它是一個自動發送電子郵件系統,那麼電子郵件主題行from_Subject有固定模式。我提供了3種方法從from_Subject中提取Patient_ID

library(dplyr) 

df1 <- data_frame(from_Email = "[email protected]", 
        Time_IN = "1/11/2000 12:00:00", 
        from_Subject = "Patient H2445JFLD presented into ER with .... symptoms") 

df2 <- data_frame(Hospital_Name = "Hospital ABC", 
        Patient_ID = "H2445JFLD") 

# Extract 2nd word from the subject line 
df1 <- df1 %>% mutate(Patient_ID = stringr::word(from_Subject, 2)) 
# Extract the word after "Patient" from the subject line 
df1 <- df1 %>% mutate(Patient_ID = str_extract(df1$from_Subject, '(?<=Patient\\s)\\w+')) 
# Extract a word of length 9 that has characters A-Z and 0-9 from the subject line 
df1 <- df1 %>% mutate(Patient_ID = str_extract(df1$from_Subject, '\\b[A-Z0-9]{9}\\b')) 

一旦您已經提取Patient_ID,那麼它是一個簡單的左加入是你需要做的。

left_join(df1, df2, on="Patient_ID") 
#Joining, by = "Patient_ID" 
# A tibble: 1 × 5 
# from_Email   Time_IN   from_Subject           Patient_ID Hospital_Name 
# <chr>     <chr>   <chr>             <chr>  <chr> 
#1 [email protected] 1/11/2000 12:00:00 Patient H2445JFLD presented into ER with .... symptoms H2445JFLD Hospital ABC