一個更加自動化的又少顯而易見的方式將比使用字符串距離的方式更有趣。
Sign_desc <- c("Resp rate:","Respiratory rate","Blood pressure panel",
"Systolic blood pressure", "Systolic blood pressure:",
"Diastolic blood pressure","Diastolic blood pressure:","resp rate")
ad <- adist(Sign_desc)
rownames(ad) <- Sign_desc
hc <- hclust(as.dist(ad))
plot(hc)
rect.hclust(hc, 3)
根據情節上面,3組可能是合適的,這樣你就可以再使用cutree
看到這串會落入這組
(ct <- cutree(hc, 3))
# Resp rate: Respiratory rate Blood pressure panel
# 1 1 2
# Systolic blood pressure Systolic blood pressure: Diastolic blood pressure
# 3 3 3
# Diastolic blood pressure: resp rate
# 3 1
你也可以使用這些小組按順序給你新的名字。從上面看,我想RR對應於1秒,BP的2S和3S等
## new names corresponding to the groups above
nn <- c('RR', 'BP', 'BP')
cbind(old = Sign_desc, new = nn[ct])
# old new
# [1,] "Resp rate:" "RR"
# [2,] "Respiratory rate" "RR"
# [3,] "Blood pressure panel" "BP"
# [4,] "Systolic blood pressure" "BP"
# [5,] "Systolic blood pressure:" "BP"
# [6,] "Diastolic blood pressure" "BP"
# [7,] "Diastolic blood pressure:" "BP"
# [8,] "resp rate" "RR"
這裏是所有的代碼中使用
Sign_desc <- c("Resp rate:","Respiratory rate","Blood pressure panel","Systolic blood pressure", "Systolic blood pressure:","Diastolic blood pressure","Diastolic blood pressure:","resp rate")
ad <- adist(Sign_desc)
rownames(ad) <- Sign_desc
hc <- hclust(as.dist(ad))
plot(hc)
rect.hclust(hc, 3)
(ct <- cutree(hc, 3))
nn <- c('RR', 'BP', 'BP')
cbind(old = Sign_desc, new = nn[ct])
沒有一個神奇的功能,請讓你的例如可重現的 – rawr
'grep' /'grepl',可能。直接分配到因子水平而不是價值可能更快,但要小心你的訂單,否則你會搞亂你的數據。 – alistaire
@rawr讓示例可重現。 – user3897