2015-04-29 68 views
4

我在Oracle中有一個表格,有四列。 Table Data in Oracle.如何從Oracle表中獲取幾乎匹配的字符串?

現在,用戶可以輸入輸入字符串爲「經營權膝」(有效期)到我的查詢和我的查詢應該返回ICD碼(IKR123)相匹配最DiagnosisName列字。

以下是我當前的查詢。(不給予適當的輸出)

SELECT diagnosisname 
FROM 
    (SELECT diagnosisname, 
    UTL_MATCH.jaro_winkler_similarity('%operation Knee right%',diagnosisname) 
    FROM icd_code 
    ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY('%operation Knee right%',diagnosisname) DESC 
) 
WHERE ROWNUM<2; 

該查詢給我的輸出爲「左膝操作」,但我的期望是「右膝蓋手術」。

回答

3

有幾件事情需要注意有關的UTL_MATCH用法:

  • EDIT_DISTANCE_SIMILARITY:返回0和100,其中0表示根本和100沒有相似性之間的整數表示絕配。
  • JARO_WINKLER_SIMILARITY:返回一個介於0和100之間的整數,其中0表示完全不相似,100表示​​完美匹配,但會嘗試考慮可能的數據錄入錯誤。

ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY( '%操作右膝關節%',diagnosisname)DESC

這不會給你正確的結果。因爲,您只是在考慮可能的相似性,但是不考慮數據錄入錯誤。因此,您必須使用JARO_WINKLER_SIMILARITY

操作膝關節權

你需要牢記情況下,輸入和列值進行比較的。他們必須在類似的情況下進行正確的匹配。您正在通過LOWERCASE傳遞輸入,但是,您的列值是INITCAP。更好地將列值和輸入轉換爲類似的情況。

讓我們看看下面的示範明白:

SQL> WITH DATA AS(
    2 SELECT 'Heart Operation' diagnosis_name, 'IH123' icd_code FROM dual UNION ALL 
    3 SELECT 'Knee Operation' diagnosis_name, 'IK123' icd_code FROM dual UNION ALL 
    4 SELECT 'Left Knee Operation' diagnosis_name, 'IKL123' icd_code FROM dual UNION ALL 
    5 SELECT 'Right Knee Operation' diagnosis_name, 'IKR123' icd_code FROM dual UNION ALL 
    6 SELECT 'Fever' diagnosis_name, 'IF123' icd_code FROM dual 
    7 ) 
    8 SELECT t.*, 
    9 utl_match.edit_distance_similarity(upper(diagnosis_name),upper('operation Knee right')) eds, 
10 UTL_MATCH.jaro_winkler_similarity (upper(diagnosis_name),upper('operation Knee right')) jws 
11 FROM DATA t 
12 ORDER BY jws DESC 
13/

DIAGNOSIS_NAME  ICD_CO  EDS  JWS 
-------------------- ------ ---------- ---------- 
Right Knee Operation IKR123   20   72 
Knee Operation  IK123   20   70 
Heart Operation  IH123   25   68 
Left Knee Operation IKL123   25   64 
Fever    IF123   15   47 

SQL> 

所以,你看怎麼都是彼此不同。 jaro_winkler_similarity在確定數據輸入錯誤和給出最接近的匹配方面做得更好。在此基礎上,按照降序排序後簡單選擇第一行:

SQL> WITH DATA AS(
    2 SELECT 'Heart Operation' diagnosis_name, 'IH123' icd_code FROM dual UNION ALL 
    3 SELECT 'Knee Operation' diagnosis_name, 'IK123' icd_code FROM dual UNION ALL 
    4 SELECT 'Left Knee Operation' diagnosis_name, 'IKL123' icd_code FROM dual UNION ALL 
    5 SELECT 'Right Knee Operation' diagnosis_name, 'IKR123' icd_code FROM dual UNION ALL 
    6 SELECT 'Fever' diagnosis_name, 'IF123' icd_code FROM dual 
    7 ) 
    8 SELECT diagnosis_name 
    9 FROM 
10 (SELECT t.*, 
11  utl_match.edit_distance_similarity(upper(diagnosis_name),upper('operation Knee right')) eds, 
12  UTL_MATCH.jaro_winkler_similarity (upper(diagnosis_name),upper('operation Knee right')) jws 
13 FROM DATA t 
14 ORDER BY jws DESC 
15 ) 
16 WHERE rownum = 1 
17/

DIAGNOSIS_NAME 
-------------------- 
Right Knee Operation 

SQL> 
+0

嗨@Lalit,你的查詢工作正常,但用戶也可以輸入**操作膝蓋rt **而不是**操作膝蓋右**。左= lt,右= lt等, –

+0

@ shary.sharath你可以用'DECODE'來定製一些東西。該功能爲您提供了最接近的匹配,但是有幾件事超出了該功能的範圍。您需要明確編寫查詢來處理可能存在歧義的情況。例如,'膝蓋rt'將會產生兩行,因爲兩者都很接近。但是,如果將'rt'解碼爲'right'並將'lt'解碼爲'left',則將獲得所需的輸出。請標記爲已回答。 –

+0

你好@Lalit,我用您的查詢'選擇diagnosisname FROM (SELECT diagnosisname,utl_match.edit_distance_similarity(UPPER(diagnosisname),UPPER( '左膝蓋手術'))編, UTL_MATCH.jaro_winkler_similarity(UPPER(diagnosisname) UPPER('knee operation Left'))jws FROM icd_code ORDER BY jws DESC ) WHERE ROWNUM = 1'但是它正在返回**膝關節手術**而不是**左膝關節手術**。我現在應該怎麼做? –

0

請試試這個查詢。這可能有助於解決您的問題。

SELECT diagnosisname 
    FROM (SELECT diagnosisname, UTL_MATCH.jaro_winkler_similarity('%operation Knee right%',diagnosisname) 
    FROM icd_code 
    WHERE UTL_MATCH.jaro_winkler_similarity('%operation Knee right%',diagnosisname) = 100 
    ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY('%operation Knee right%',diagnosisname) DESC) 
WHERE ROWNUM<2 
+0

你好@Pankaj,你在比較** UTL_MATCH。jaro_winkler_similarity('%operation Knee right%',diagnosisname)= 100 **這意味着第一個字符串和第二個字符串都應該完全匹配。但我的要求就像即使部分匹配(但幾乎)匹配應該工作。
但是你的查詢什麼都沒有返回。 –

相關問題