2016-07-13 32 views
0

我有一個我希望總結的大型數據集。 這些數據是健康記錄,其中每個人都有許多器官/組織被檢查,並且診斷以敘述形式輸入。我有幾個我想找到的關鍵診斷術語,然後我想知道哪些器官與診斷相關。R:根據另一列中檢測到的字符串提取數據

例如 (所有條目轉換爲字符串)

dataframe1

Organ   Diagnosis 
lungs   interstitial pneumonia 
liver   hepatic congestion ; diffuse 
cerebrum  traumatic disruption and hemorrhage  
adrenal gland focal hemorrhage 

dataframe2

Keywords 
congestion 
hemorrhage 
trauma 
pneumonia 

我想搜索dataframe1$Diagnosis爲匹配dataframe2$Keywords, 的字符串和用於每個匹配,返回輸入相應行的器官。

數據結構

dataframe1 <- structure(list(Organ = c("lungs", "liver", "cerebrum", "adrenal gland" 
), Diagnosis = c("interstitial pneumonia", "hepatic congestion ; diffuse", 
"traumatic disruption and hemorrhage", "focal hemorrhage")), .Names = c("Organ", 
"Diagnosis"), class = "data.frame", row.names = c(NA, -4L)) 

dataframe2 <- data.frame(Keywords=c("congestion","hemorrhage","trauma","pneumonia"),stringsAsFactors=FALSE) 
+0

dataframe1 $器官<-c(肺,肝,大腦,腎上腺) dataframe1 $診斷<-c(間質性肺炎,肝充血,創傷性中斷和出血,竈性出血) dataframe2 $關鍵詞<-c(充血,出血,創傷,肺炎) – SJR

+0

抱歉格式可怕! – SJR

回答

2

我們可以使用grep

sapply(dataframe2$Keywords, function(x) 
     toString(trimws(dataframe1[,1][grep(x, dataframe1[,2])]))) 
2

我認爲它可能是有價值的回報是什麼匹配什麼,在堆疊列表:

stack(
    sapply(dataframe2$Keywords, 
     function(x) dataframe1$Organ[grepl(x, dataframe1$Diagnosis)]) 
) 

#   values  ind 
#1   liver congestion 
#2  cerebrum hemorrhage 
#3 adrenal gland hemorrhage 
#4  cerebrum  trauma 
#5   lungs pneumonia 
相關問題