R中

2016-04-12 56 views
-1

嗨使用正則表達式我試圖R中R中

"[report_beginning] 

101962493|2011-06-09|final|Omary, Lea, M.D.|43654754|Major Academic Center 

_Ms.Wattley is a 88 year-old patient who comes in today with a chief complaint of PREG/SPOTTING. 

ALLERGIES:  none 

SOCIAL HISTORY:  The patient Ms.Wattley is a past smoker who has a visiting nurse. Patient is bed-bound. 

PHYSICAL EXAMINATION:  Blood pressure 125/98, pulse 55, respiratory rate 7, temperature 98.7, and O2 saturation 98 on room air.  General:  This is a patient in severe distress.   

EMERGENCY DEPARTMENT COURSE:  I confirm that I have seen and evaluated the patient, reviewed the resident's documentation on the patient's chart. The following procedures were performed: Medication:medication given. Procedure:no procedures performed. Testing:testing conducted . Please review the chart for more details. 

DISPOSITION:  The patient was admitted to the hospital with a primary diagnosis of Threatened abortion, antepartum condition or complication. 

從段落提取單個句子,所以這是一個細胞。我有一列滿是這樣的數據,我想提取一行。 「體格檢查:室內空氣中血壓125/98,脈搏55,呼吸頻率7,溫度98.7,氧飽和度98」。

我如何用R中的正則表達式來做到這一點?

我一直在使用下面的代碼,但它不起作用。它給了我一個空的數據集

x=grep("Blood pressure .+ air. ", ed_dia, value = TRUE) 
+0

如果您事先知道完整的句子,您爲什麼需要提取它? – nrussell

+0

因爲這個句子在整個數據集中都會重複@nrussell –

+1

請包含您的數據集樣本和期望的輸出。你的問題並不明確。 – nrussell

回答

1

我假設"[report begiinning實際上不是在數據文件中,所以打開文本連接讀取該文件應該成功:

輸入數據後
txt <- "101962493|2011-06-09|final|Omary, Lea, M.D.|43654754|Major Academic Center 

_Ms.Wattley is a 88 year-old patient who comes in today with a chief complaint of PREG/SPOTTING. 

ALLERGIES: Â none 

SOCIAL HISTORY: Â The patient Ms.Wattley is a past smoker who has a visiting nurse. Patient is bed-bound. 

PHYSICAL EXAMINATION: Â Blood pressure 125/98, pulse 55, respiratory rate 7, temperature 98.7, and O2 saturation 98 on room air. Â General: Â This is a patient in severe distress. Â 

EMERGENCY DEPARTMENT COURSE: Â I confirm that I have seen and evaluated the patient, reviewed the resident's documentation on the patient's chart. The following procedures were performed: Medication:medication given. Procedure:no procedures performed. Testing:testing conducted . Please review the chart for more details. 

DISPOSITION: Â The patient was admitted to the hospital with a primary diagnosis of Threatened abortion, antepartum condition or complication. " 

inp <- readLines(textConnection(txt)) 

所以只有使用grep來標識"PHYSICAL EXAMINATION"(我不確定該空間是否需要特殊的正則表達式處理),然後使用"["從多行中提取:

inp[ grep("PHYSICAL[ ]EXAMINATION", inp)] 
#[1] "PHYSICAL EXAMINATION: Â Blood pressure 125/98, pulse 55, respiratory rate 7, temperature 98.7, and O2 saturation 98 on room air. Â General: Â This is a patient in severe distress. Â "