有幾件事情需要注意有關的UTL_MATCH用法:
- EDIT_DISTANCE_SIMILARITY:返回0和100,其中0表示根本和100沒有相似性之間的整數表示絕配。
- JARO_WINKLER_SIMILARITY:返回一個介於0和100之間的整數,其中0表示完全不相似,100表示完美匹配,但會嘗試考慮可能的數據錄入錯誤。
ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY( '%操作右膝關節%',diagnosisname)DESC
這不會給你正確的結果。因爲,您只是在考慮可能的相似性,但是不考慮數據錄入錯誤。因此,您必須使用JARO_WINKLER_SIMILARITY。
操作膝關節權
你需要牢記情況下,輸入和列值進行比較的。他們必須在類似的情況下進行正確的匹配。您正在通過LOWERCASE傳遞輸入,但是,您的列值是INITCAP。更好地將列值和輸入轉換爲類似的情況。
讓我們看看下面的示範明白:
SQL> WITH DATA AS(
2 SELECT 'Heart Operation' diagnosis_name, 'IH123' icd_code FROM dual UNION ALL
3 SELECT 'Knee Operation' diagnosis_name, 'IK123' icd_code FROM dual UNION ALL
4 SELECT 'Left Knee Operation' diagnosis_name, 'IKL123' icd_code FROM dual UNION ALL
5 SELECT 'Right Knee Operation' diagnosis_name, 'IKR123' icd_code FROM dual UNION ALL
6 SELECT 'Fever' diagnosis_name, 'IF123' icd_code FROM dual
7 )
8 SELECT t.*,
9 utl_match.edit_distance_similarity(upper(diagnosis_name),upper('operation Knee right')) eds,
10 UTL_MATCH.jaro_winkler_similarity (upper(diagnosis_name),upper('operation Knee right')) jws
11 FROM DATA t
12 ORDER BY jws DESC
13/
DIAGNOSIS_NAME ICD_CO EDS JWS
-------------------- ------ ---------- ----------
Right Knee Operation IKR123 20 72
Knee Operation IK123 20 70
Heart Operation IH123 25 68
Left Knee Operation IKL123 25 64
Fever IF123 15 47
SQL>
所以,你看怎麼都是彼此不同。 jaro_winkler_similarity在確定數據輸入錯誤和給出最接近的匹配方面做得更好。在此基礎上,按照降序排序後簡單選擇第一行:
SQL> WITH DATA AS(
2 SELECT 'Heart Operation' diagnosis_name, 'IH123' icd_code FROM dual UNION ALL
3 SELECT 'Knee Operation' diagnosis_name, 'IK123' icd_code FROM dual UNION ALL
4 SELECT 'Left Knee Operation' diagnosis_name, 'IKL123' icd_code FROM dual UNION ALL
5 SELECT 'Right Knee Operation' diagnosis_name, 'IKR123' icd_code FROM dual UNION ALL
6 SELECT 'Fever' diagnosis_name, 'IF123' icd_code FROM dual
7 )
8 SELECT diagnosis_name
9 FROM
10 (SELECT t.*,
11 utl_match.edit_distance_similarity(upper(diagnosis_name),upper('operation Knee right')) eds,
12 UTL_MATCH.jaro_winkler_similarity (upper(diagnosis_name),upper('operation Knee right')) jws
13 FROM DATA t
14 ORDER BY jws DESC
15 )
16 WHERE rownum = 1
17/
DIAGNOSIS_NAME
--------------------
Right Knee Operation
SQL>
嗨@Lalit,你的查詢工作正常,但用戶也可以輸入**操作膝蓋rt **而不是**操作膝蓋右**。左= lt,右= lt等, –
@ shary.sharath你可以用'DECODE'來定製一些東西。該功能爲您提供了最接近的匹配,但是有幾件事超出了該功能的範圍。您需要明確編寫查詢來處理可能存在歧義的情況。例如,'膝蓋rt'將會產生兩行,因爲兩者都很接近。但是,如果將'rt'解碼爲'right'並將'lt'解碼爲'left',則將獲得所需的輸出。請標記爲已回答。 –
你好@Lalit,我用您的查詢'選擇diagnosisname FROM (SELECT diagnosisname,utl_match.edit_distance_similarity(UPPER(diagnosisname),UPPER( '左膝蓋手術'))編, UTL_MATCH.jaro_winkler_similarity(UPPER(diagnosisname) UPPER('knee operation Left'))jws FROM icd_code ORDER BY jws DESC ) WHERE ROWNUM = 1'但是它正在返回**膝關節手術**而不是**左膝關節手術**。我現在應該怎麼做? –