2011-06-27 13 views
6

什麼是最好的方式來排序一個sql varchar列的數量(計數)/單詞匹配的參數,有四個不同的獨特標準。這可能不是一個微不足道的問題,但我受到挑戰,要根據我的標準對基於「最佳匹配」的行進行排序。SQL Server最好的方法來匹配單詞短語和命令相關

柱:描述VARCHAR(100) 參數:@MyParameter VARCHAR(100)

這個順序優先輸出:

  • 完全匹配(匹配整個字符串) - 總是第一
  • 起始(根據匹配的參數長度降低)
  • 字數排名連續字排名高於相同匹配字數
  • 字(S)匹配的任何地方(不連續)

詞可能不完全匹配,一個字作爲部分匹配是允許的,可能的,出租人值應適用於局部字排名,但不是關鍵的(鍋會匹配每個在鍋:potter,隔熱墊,倉庫,depotting例如)。與其他詞匹配開始應該比沒有後續匹配的排名高,但這不是一個交易殺手/超級重要。

我想有一個方法來排列列「開始」參數中的值。假設我有以下字符串:

'This is my value string as a test template to rank on.' 

我希望在第一種情況下,列/行的排名最大的字數存在。

而第二個排名基於在開始時作爲一次出現(最佳匹配):

'This is my string as a test template to rank on.' - first 
'This is my string as a test template to rank on even though not exact.'-second 
'This is my string as a test template to rank' - third 
'This is my string as a test template to' - next 
'This is my string as a test template' - next etc. 

其次:第一(後(可能第二組/數據的組開頭) - 這是期望

我想排名(排序)發生在與秩@MyParameter行通過在@MyParameter字的計數,其中連續的字排名低於相同的計數分開更高。

因此,對於上面的例子中串,'is my string as shown' woul d排名高於'is not my other string as',因爲連續字符串(單詞一起)與相同字數的「更好匹配」。具有較高匹配的行數(發生的單詞數量)將首先排名降序最佳匹配。

如果可能,我想在單個查詢中執行此操作。

結果中不應出現兩行。

出於性能考慮,表中不會出現超過10,000行。

表中的值相當靜態,但變化不大,但並非如此。

我不能改變結構,在這個時間,但以後會考慮(如詞/短語表)

爲了使這個稍微複雜一些,字表是在兩個表 - 但是我可以創建一個視圖爲此,但是一個表結果(較小的列表)應該在一秒之前出現,給定相同匹配的較大的數據集結果 - 這些表和表內都會有重複項,並且我只想要不同的值。選擇DISTINCT並不容易,因爲我想返回一列(sourceTable),這很可能使行不同,在這種情況下,只從第一個(較小)的表中選擇,但所有其他列DISTINCT是所需的(不要考慮在「不同」的評價列在表

的僞柱:

procedureCode VARCHAR(50), 
description VARCHAR(100), -- this is the sort/evaluation column 
category VARCHAR(50), 
relvu  VARCHAR(50), 
charge VARCHAR(15), 
active bit 
sourceTable VARCHAR(50) - just shows which table it comes from of the two 

不存在唯一索引等的ID列

相配NOT在第三個表被排除SELECT * FROM (select * from tableone where procedureCode not in (select procedureCode from tablethree)) UNION ALL (select * from tabletwo where procedureCode not in (select procedureCode from tablethree))

編輯:在試圖解決這個問題我已經創建了一個表值放慢參數,像這樣:

0  Gastric Intubation & Aspiration/Lavage, Treatmen 
1  Gastric%Intubation%Aspiration%Lavage%Treatmen 
2  Gastric%Intubation%Aspiration%Lavage 
3  Gastric%Intubation%Aspiration 
4  Gastric%Intubation 
5  Gastric 
6  Intubation%Aspiration%Lavage%Treatmen 
7  Intubation%Aspiration%Lavage 
8  Intubation%Aspiration 
9  Intubation 
10  Aspiration%Lavage%Treatmen 
11  Aspiration%Lavage 
12  Aspiration 
13  Lavage%Treatmen 
14  Lavage 
15  Treatmen 

其中實際詞組在0行

這是我在這個當前的嘗試:

CREATE PROCEDURE [GetProcedureByDescription] 
( 
     @IncludeMaster BIT, 
     @ProcedureSearchPhrases CPTFavorite READONLY 

) 
AS 

    DECLARE @myIncludeMaster BIT; 

    SET @myIncludeMaster = @IncludeMaster; 

    CREATE TABLE #DistinctMatchingCpts 
    (
    procedureCode VARCHAR(50), 
    description  VARCHAR(100), 
    category  VARCHAR(50), 
    rvu  VARCHAR(50), 
    charge  VARCHAR(15), 
    active  VARCHAR(15), 
    sourceTable VARCHAR(50), 
    sequenceSet VARCHAR(2) 
    ) 

    IF @myIncludeMaster = 0 
     BEGIN -- Excluding master from search 
      INSERT INTO #DistinctMatchingCpts (sourceTable, procedureCode, description , category ,charge, active, rvu, sequenceSet 
) 
     SELECT DISTINCT sourceTable, procedureCode, description, category ,charge, active, rvu, sequenceSet 
      FROM (
        SELECT TOP 1 
         LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, 
         LTRIM(RTRIM(CPT.[LEVEL])) AS description, 
         LTRIM(RTRIM(CPT.[COMBO])) AS category, 
         LTRIM(RTRIM(CPT.[CHARGE])) AS charge, 
         ''True'' AS active, 
         LTRIM(RTRIM([RVU])) AS rvu, 
         ''0CPTMore'' AS sourceTable, 
         ''01'' AS sequenceSet 
        FROM 
        @ProcedureSearchPhrases PP 
        INNER JOIN [CPTMORE] AS CPT 
         ON CPT.[LEVEL] = PP.[LEVEL] 
        WHERE 
         (CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles'')) 
         AND CPT.[CODE] IS NOT NULL 
         AND CPT.[CODE] NOT IN (''0'', '''') 
        AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) 
        ORDER BY PP.CODE 

      UNION ALL 

        SELECT 
         LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, 
         LTRIM(RTRIM(CPT.[LEVEL])) AS description, 
         LTRIM(RTRIM(CPT.[COMBO])) AS category, 
         LTRIM(RTRIM([CHARGE])) AS charge, 
         ''True'' AS active, 
         LTRIM(RTRIM([RVU])) AS rvu, 
         ''0CPTMore'' AS sourceTable, 
         ''02'' AS sequenceSet 
        FROM 
        @ProcedureSearchPhrases PP 
        INNER JOIN [CPTMORE] AS CPT 
         ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%'' 
        WHERE 
         (CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles'')) 
         AND CPT.[CODE] IS NOT NULL 
         AND CPT.[CODE] NOT IN (''0'', '''') 
        AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) 

      UNION ALL 

      SELECT 
         LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, 
         LTRIM(RTRIM(CPT.[LEVEL])) AS description, 
         LTRIM(RTRIM(CPT.[COMBO])) AS category, 
         LTRIM(RTRIM(CPT.[CHARGE])) AS charge, 
         ''True'' AS active, 
         LTRIM(RTRIM([RVU])) AS rvu, 
         ''0CPTMore'' AS sourceTable, 
         ''03'' AS sequenceSet 
        FROM 
        @ProcedureSearchPhrases PP 
        INNER JOIN [CPTMORE] AS CPT 
         ON CPT.[LEVEL] LIKE ''%'' + PP.[LEVEL] + ''%'' 
        WHERE 
         (CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles'')) 
         AND CPT.[CODE] IS NOT NULL 
         AND CPT.[CODE] NOT IN (''0'', '''') 
        AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) 

      ) AS CPTS 
      ORDER BY 
       procedureCode, sourceTable, [description] 
     END -- Excluded master from search 
    ELSE 
     BEGIN -- Including master in search, but present favorites before master for each code 
      -- Get matching procedures, ordered by code, source (favorites first), and description. 
      -- There probably will be procedures with duplicated code+description, so we will filter 
      -- duplicates shortly. 
     INSERT INTO #DistinctMatchingCpts (sourceTable, procedureCode, description , category ,charge, active, rvu, sequenceSet) 
     SELECT DISTINCT sourceTable, procedureCode, description, category ,charge, active, rvu, sequenceSet 
      FROM (
        SELECT TOP 1 
         LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, 
         LTRIM(RTRIM(CPT.[LEVEL])) AS description, 
         LTRIM(RTRIM(CPT.[COMBO])) AS category, 
         LTRIM(RTRIM(CPT.[CHARGE])) AS charge, 
         ''True'' AS active, 
         LTRIM(RTRIM([RVU])) AS rvu, 
         ''0CPTMore'' AS sourceTable, 
         ''00'' AS sequenceSet 
       FROM 
        @ProcedureSearchPhrases PP 
        INNER JOIN [CPTMORE] AS CPT 
         ON CPT.[LEVEL] = PP.[LEVEL] 
        WHERE 
         (CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles'')) 
         AND CPT.[CODE] IS NOT NULL 
         AND CPT.[CODE] NOT IN (''0'', '''') 
        AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) 
        ORDER BY PP.CODE 

        UNION ALL 

        SELECT TOP 1 
         LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, 
         LTRIM(RTRIM(CPT.[LEVEL])) AS description, 
         LTRIM(RTRIM(CPT.[CATEGORY])) AS category, 
         LTRIM(RTRIM(CPT.[CHARGE])) AS charge, 
         COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active, 
         LTRIM(RTRIM([RVU])) AS rvu, 
         ''2MasterCPT'' AS sourceTable, 
         ''00'' AS sequenceSet 
        FROM 
        @ProcedureSearchPhrases PP 
        INNER JOIN [MASTERCPT] AS CPT 
         ON CPT.[LEVEL] = PP.[LEVEL] 
        WHERE 
         CPT.[CODE] IS NOT NULL 
         AND CPT.[CODE] NOT IN (''0'', '''') 
        AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) 
        ORDER BY PP.CODE 

        UNION ALL 

        SELECT 
         LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, 
         LTRIM(RTRIM(CPT.[LEVEL])) AS description, 
         LTRIM(RTRIM(CPT.[COMBO])) AS category, 
         LTRIM(RTRIM(CPT.[CHARGE])) AS charge, 
         ''True'' AS active, 
         LTRIM(RTRIM([RVU])) AS rvu, 
         ''0CPTMore'' AS sourceTable, 
         ''01'' AS sequenceSet 
       FROM 
        @ProcedureSearchPhrases PP 
        INNER JOIN [CPTMORE] AS CPT 
         ON CPT.[LEVEL] = PP.[LEVEL] 
        WHERE 
         (CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles'')) 
         AND CPT.[CODE] IS NOT NULL 
         AND CPT.[CODE] NOT IN (''0'', '''') 
        AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) 

        UNION ALL 

        SELECT 
         LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, 
         LTRIM(RTRIM(CPT.[LEVEL])) AS description, 
         LTRIM(RTRIM(CPT.[CATEGORY])) AS category, 
         LTRIM(RTRIM(CPT.[CHARGE])) AS charge, 
         COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active, 
         LTRIM(RTRIM([RVU])) AS rvu, 
         ''2MasterCPT'' AS sourceTable, 
         ''01'' AS sequenceSet 
        FROM 
        @ProcedureSearchPhrases PP 
        INNER JOIN [MASTERCPT] AS CPT 
         ON CPT.[LEVEL] = PP.[LEVEL] 
        WHERE 
         CPT.[CODE] IS NOT NULL 
         AND CPT.[CODE] NOT IN (''0'', '''') 
        AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) 

        UNION ALL 

        SELECT TOP 1 
         LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, 
         LTRIM(RTRIM(CPT.[LEVEL])) AS description, 
         LTRIM(RTRIM(CPT.[COMBO])) AS category, 
         LTRIM(RTRIM(CPT.[CHARGE])) AS charge, 
         ''True'' AS active, 
         LTRIM(RTRIM([RVU])) AS rvu, 
         ''0CPTMore'' AS sourceTable, 
         ''02'' AS sequenceSet 
       FROM 
        @ProcedureSearchPhrases PP 
        INNER JOIN [CPTMORE] AS CPT 
         ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%'' 
        WHERE 
         (CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles'')) 
         AND CPT.[CODE] IS NOT NULL 
         AND CPT.[CODE] NOT IN (''0'', '''') 
        AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) 
        ORDER BY PP.CODE 

        UNION ALL 

        SELECT TOP 1 
         LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, 
         LTRIM(RTRIM(CPT.[LEVEL])) AS description, 
         LTRIM(RTRIM(CPT.[CATEGORY])) AS category, 
         LTRIM(RTRIM(CPT.[CHARGE])) AS charge, 
         COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active, 
         LTRIM(RTRIM([RVU])) AS rvu, 
         ''2MasterCPT'' AS sourceTable, 
         ''02'' AS sequenceSet 
        FROM 
        @ProcedureSearchPhrases PP 
        INNER JOIN [MASTERCPT] AS CPT 
         ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%'' 
        WHERE 
         CPT.[CODE] IS NOT NULL 
         AND CPT.[CODE] NOT IN (''0'', '''') 
        AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) 
        ORDER BY PP.CODE 

        UNION ALL 

        SELECT 
         LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, 
         LTRIM(RTRIM(CPT.[LEVEL])) AS description, 
         LTRIM(RTRIM(CPT.[COMBO])) AS category, 
         LTRIM(RTRIM(CPT.[CHARGE])) AS charge, 
         ''True'' AS active, 
         LTRIM(RTRIM([RVU])) AS rvu, 
         ''0CPTMore'' AS sourceTable, 
         ''03'' AS sequenceSet 
       FROM 
        @ProcedureSearchPhrases PP 
        INNER JOIN [CPTMORE] AS CPT 
         ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%'' 
        WHERE 
         (CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles'')) 
         AND CPT.[CODE] IS NOT NULL 
         AND CPT.[CODE] NOT IN (''0'', '''') 
        AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) 

        UNION ALL 

        SELECT 
         LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, 
         LTRIM(RTRIM(CPT.[LEVEL])) AS description, 
         LTRIM(RTRIM(CPT.[CATEGORY])) AS category, 
         LTRIM(RTRIM(CPT.[CHARGE])) AS charge, 
         COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active, 
         LTRIM(RTRIM([RVU])) AS rvu, 
         ''2MasterCPT'' AS sourceTable, 
         ''03'' AS sequenceSet 
        FROM 
        @ProcedureSearchPhrases PP 
        INNER JOIN [MASTERCPT] AS CPT 
         ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%'' 
        WHERE 
         CPT.[CODE] IS NOT NULL 
         AND CPT.[CODE] NOT IN (''0'', '''') 
        AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) 

        UNION ALL 

        SELECT 
         LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, 
         LTRIM(RTRIM(CPT.[LEVEL])) AS description, 
         LTRIM(RTRIM(CPT.[COMBO])) AS category, 
         LTRIM(RTRIM(CPT.[CHARGE])) AS charge, 
         ''True'' AS active, 
         LTRIM(RTRIM([RVU])) AS rvu, 
         ''0CPTMore'' AS sourceTable, 
         ''04'' AS sequenceSet 
       FROM 
        @ProcedureSearchPhrases PP 
        INNER JOIN [CPTMORE] AS CPT 
         ON CPT.[LEVEL] LIKE ''%'' + PP.[LEVEL] + ''%'' 
        WHERE 
         (CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles'')) 
         AND CPT.[CODE] IS NOT NULL 
         AND CPT.[CODE] NOT IN (''0'', '''') 
        AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) 

        UNION ALL 

        SELECT 
         LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, 
         LTRIM(RTRIM(CPT.[LEVEL])) AS description, 
         LTRIM(RTRIM(CPT.[CATEGORY])) AS category, 
         LTRIM(RTRIM(CPT.[CHARGE])) AS charge, 
         COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active, 
         LTRIM(RTRIM([RVU])) AS rvu, 
         ''2MasterCPT'' AS sourceTable, 
         ''04'' AS sequenceSet 
        FROM 
        @ProcedureSearchPhrases PP 
        INNER JOIN [MASTERCPT] AS CPT 
         ON CPT.[LEVEL] LIKE ''%'' + PP.[LEVEL] + ''%'' 
        WHERE 
         CPT.[CODE] IS NOT NULL 
         AND CPT.[CODE] NOT IN (''0'', '''') 
        AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) 

      ) AS CPTS 

      ORDER BY 
       sequenceSet, sourceTable, [description] 

     END 

     /* Final select - uses artificial ordering from the insertion ORDER BY */ 
     SELECT procedureCode, description, category, rvu, charge, active FROM 
     ( 
     SELECT TOP 500 *-- procedureCode, description, category, rvu, charge, active 
     FROM #DistinctMatchingCpts 
     ORDER BY sequenceSet, sourceTable, description 

     ) AS CPTROWS 

     DROP TABLE #DistinctMatchingCpts 

但是,這不符合單詞數量最好匹配的條件(如樣本中的第1行值),它應該匹配從該行中找到的最好(最多)單詞數。

如果這有所改變,我可以完全控制表格值參數的格式/格式。

我將這個結果返回給c#程序,如果這是有用的。

+0

做了任何這些答案你的問題? –

+0

幾個答案,一些想法,但沒有一個完全足以獲得滿足條件列表的完整結果集。目前,我正在對一種算法進行原型設計,似乎正在做我想做的事情 - 一旦完成了審查,我將確定它是否是滿足這些目標的可行解決方案。 –

回答

0

聽起來好像您正在尋找匹配算法,如果不使用存儲過程,可能很難創建匹配算法。根據過去的經驗,有edit distance algorithms(如Levenshtein),這對確定相似性非常有用。這些會返回一個數字,有時候還會有一些字符串之間的差異,您可以在其中創建自己的權重公式來給出分數。然後,您可以創建分數的排名或閾值以降低誤報/誤報。

+0

有點像這樣,但條款是非常具體的,並保留一組有限的單詞 - 所有具體的,所以一個「完全匹配」就足以滿足我的目的,只需要根據所描述的優先級獲得幾組精確匹配。好的建議,但。 –

+0

我也可以使用存儲過程或任何我需要的來實現正確的結果集。 –

+0

Levenshtein距離的好建議,但是它是用於比較字母的字母。現在,這裏有趣的部分:這是正確的答案:Levenshtein距離。你試圖達到的是Levenshtein距離,但不是用字母,而是用文字。我建議如果可能的話,爲了計算這個距離做一個CLR組件。單詞Levenshtein距離爲1意味着一個單詞(不是字母)不在它的位置(它不應該在那裏,或者它不存在)。所以你可以輕鬆地按這個距離排序。 – AlexanderMP

4

你需要能夠拆分字符串來解決這個問題。 I prefer the number table approach to split a string in TSQL

對於我下面的代碼工作(以及我的分裂功能),你需要做的這一個時間表設置:

SELECT TOP 10000 IDENTITY(int,1,1) AS Number 
    INTO Numbers 
    FROM sys.objects s1 
    CROSS JOIN sys.objects s2 
ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number) 

一旦表中設置了數字,創建此分割功能:

CREATE FUNCTION [dbo].[FN_ListToTable] 
(
    @SplitOn char(1)  --REQUIRED, the character to split the @List string on 
    ,@List  varchar(8000)--REQUIRED, the list to split apart 
) 
RETURNS TABLE 
AS 
RETURN 
(

    ---------------- 
    --SINGLE QUERY-- --this will not return empty rows 
    ---------------- 
    SELECT 
     ListValue 
     FROM (SELECT 
        LTRIM(RTRIM(SUBSTRING(List2, number+1, CHARINDEX(@SplitOn, List2, number+1)-number - 1))) AS ListValue 
        FROM (
          SELECT @SplitOn + @List + @SplitOn AS List2 
         ) AS dt 
         INNER JOIN Numbers n ON n.Number < LEN(dt.List2) 
        WHERE SUBSTRING(List2, number, 1) = @SplitOn 
      ) dt2 
     WHERE ListValue IS NOT NULL AND ListValue!='' 

); 
GO 

隨意創建自己的拆分功能,但仍然需要Numbers表格才能使我的解決方案工作。

您現在可以輕鬆地拆分CSV字符串轉換成表格,並加入就可以了:

select * from dbo.FN_ListToTable(',','1,2,3,,,4,5,6777,,,') 

OUTPUT:

ListValue 
----------------------- 
1 
2 
3 
4 
5 
6777 

(6 row(s) affected) 

現在試試這個:

DECLARE @BaseTable table (RowID int primary key, RowValue varchar(100)) 
set nocount on 
INSERT @BaseTable VALUES (1,'The cows came home empty handed') 
INSERT @BaseTable VALUES (2,'This is my string as a test template to rank')       -- third 
INSERT @BaseTable VALUES (3,'pencil pen paperclip eraser') 
INSERT @BaseTable VALUES (4,'wow') 
INSERT @BaseTable VALUES (5,'no dice here') 
INSERT @BaseTable VALUES (6,'This is my string as a test template to rank on even though not exact.') -- second 
INSERT @BaseTable VALUES (7,'apple banana pear grape lemon orange kiwi strawberry peach watermellon') 
INSERT @BaseTable VALUES (8,'This is my string as a test template')         -- 5th 
INSERT @BaseTable VALUES (9,'rat cat bat mat sat fat hat pat ') 
INSERT @BaseTable VALUES (10,'house home pool roll') 
INSERT @BaseTable VALUES (11,'This is my string as a test template to')        -- 4th 
INSERT @BaseTable VALUES (12,'talk wisper yell scream sing hum') 
INSERT @BaseTable VALUES (13,'This is my string as a test template to rank on.')      -- first 
INSERT @BaseTable VALUES (14,'aaa bbb ccc ddd eee fff ggg hhh') 
INSERT @BaseTable VALUES (15,'three twice three once twice three') 
set nocount off 

DECLARE @SearchValue varchar(100) 
SET @SearchValue='This is my value string as a test template to rank on.' 

;WITH SplitBaseTable AS --expand each @BaseTable row into one row per word 
(SELECT 
    b.RowID, b.RowValue, s.ListValue 
    FROM @BaseTable b 
     CROSS APPLY dbo.FN_ListToTable(' ',b.RowValue) AS s 
) 
, WordMatchCount AS --for each @BaseTable row that has has a word in common withe the search string, get the count of matching words 
(SELECT 
    s.RowID,COUNT(*) AS CountOfWordMatch 
    FROM dbo.FN_ListToTable(' ',@SearchValue) v 
     INNER JOIN SplitBaseTable    s ON v.ListValue=s.ListValue 
    GROUP BY s.RowID 
    HAVING COUNT(*)>0 
) 
, SearchLen AS --get one row for each possible length of the search string 
(
SELECT 
    n.Number,SUBSTRING(@SearchValue,1,n.Number) AS PartialSearchValue 
    FROM Numbers n 
    WHERE n.Number<=LEN(@SearchValue) 
) 
, MatchLen AS --for each @BaseTable row, get the max starting length that matches the search string 
(
SELECT 
    b.RowID,MAX(l.Number) MatchStartLen 
    FROM @BaseTable     b 
     LEFT OUTER JOIN SearchLen l ON LEFT(b.RowValue,l.Number)=l.PartialSearchValue 
    GROUP BY b.RowID 
) 
SELECT --return the final search results 
    b.RowValue,w.CountOfWordMatch,m.MatchStartLen 
    FROM @BaseTable      b 
     LEFT OUTER JOIN WordMatchCount w ON b.RowID=w.RowID 
     LEFT OUTER JOIN MatchLen  m ON b.RowID=m.RowID 
    WHERE w.CountOfWordMatch>0 
    ORDER BY w.CountOfWordMatch DESC,m.MatchStartLen DESC,LEN(b.RowValue) DESC,b.RowValue ASC 

OUTPUT:

RowValue                CountOfWordMatch MatchStartLen 
----------------------------------------------------------------------- ---------------- ------------- 
This is my string as a test template to rank on.      11    11 
This is my string as a test template to rank on even though not exact. 10    11 
This is my string as a test template to rank       10    11 
This is my string as a test template to         9    11 
This is my string as a test template         8    11 

(5 row(s) affected) 

它的字符串單詞的開始匹配有點不同,因爲它會查看匹配字符串開始處的字符數。

一旦你得到這個工作,你可以嘗試通過爲SplitBaseTable創建一些靜態索引表來優化它。可能在@BaseTable上使用觸發器。

+0

這是一個有趣的想法。目前的挑戰是:沒有一個單詞分隔符如下:)(,/ <;& space<> =% - ] [全都存在並且與雙引號中的短語一樣有點顯着:「等級> 9.0%」或「LDL-C <100mg/dl「,」LDL-C 100-129毫克/分升「,」低密度脂蛋白-C = 130毫克/分升「所以我必須找出解決這個問題的方法,也許可以通過多個分離器來創建靜態指數對於「單詞/短語」列表而言,由於在使用的基表上沒有唯一的鍵,因此更加複雜。 –

+0

關於「瘋狂」的單詞/短語拆分規則,您有三種選擇:1)編寫CLR拆分例程來處理所有必要的邏輯。 2)在字符串中插入一個像「PRINT CHAR(182)」這樣的單個字符,以清楚地標識分割。 3)重新設計表格,以便每個「短語」已經分成它自己的行,並且可以基於ID和序列號重構它們。至於主鍵,添加一個標識列,如下所示:http://blog.sqlauthority.com/2009/05/03/sql-server-add-or-remove-identity-property-on-column/並使它的PK –

+0

+1在這一點上的最佳解決方案的潛力。 –

0

我曾經有過類似的問題。我試圖回答的問題是兩個不同列之間匹配了多少單詞,並且基於匹配的單詞的最高百分比進行排名。這遠遠超過了我,但我從馬丁那裏得到了一個夢幻般的答案。

查看他的回答my question here

0

解決您所有問題的一個答案:使用sphynx http://sphinxsearch.com並且不能在SQL中解決此問題。

斯芬克斯是開源的,適用於所有數據庫和所有操作系統。

這就是craigslist正在使用的。

這是這篇文章發佈時最好的外部全文檢索系統。它將按照您要求的相關性對您的結果進行排序,並且不需要花哨的SQL表或SQL過程。嘗試一下。

+0

有時候,無論好壞,都必須使用SQL檢索記錄(例如,如果您還要根據其他條件篩選它們,並且將它們鏈接到相關表)。 –

相關問題