2014-01-17 20 views
0

說我有一個表稱爲PHRASES包含一些文本字符串如何計算表格中的每個關鍵字出現在短語表中的次數?

+--+---------------+ 
|ID|PHRASE   | 
+--+---------------+ 
|0 |"HELLO BYE YES"| 
+--+---------------+ 
|1 |"NO WHY NOT" | 
+--+---------------+ 
|2 |"NO YES"  | 
+--+---------------+ 

而且我想補充以下每個單詞出現的OCCURRENCE列的次數,我們稱之爲表KEYWORDS

+--------+----------+ 
|KEYWORD |OCCURRENCE| 
+--------+----------+ 
|"YES" |NULL  | 
+--------+----------+ 
|"NO" |NULL  | 
+--------+----------+ 
|"HELLO" |NULL  | 
+--------+----------+ 
|"CHEESE"|NULL  | 
+--------+---------+ 

我現在想寫一個查詢,將更新KEYWORDS以下幾點:

+--------+----------+ 
|KEYWORD |OCCURRENCE| 
+--------+----------+ 
|"YES" |2   | 
+--------+----------+ 
|"NO" |2   | 
+--------+----------+ 
|"HELLO" |1   | 
+--------+----------+ 
|"CHEESE"|0   | 
+--------+----------+ 

請注意,我已經有一個叫做dbo.RegExIsMatch功能,可以採取字符串匹配的照顧,使得其返回1參數1個字符串匹配參數2:

UPDATE KEYWORDS SET OCCURRENCE = 
(
    SELECT SUM 
    (
      -- the following returns 1 if the keyword exists in the phrase, or 0 otherwise 
     CASE WHEN dbo.RegExIsMatch('.*' + KEYWORDS.KEYWORD + '.*',PHRASES.PHRASE,1) = 1 THEN 1 ELSE 0 END 
    ) 
    FROM PHRASES 
    CROSS JOIN KEYWORDS 
) 

這不工作,雖然,它只是用相同的數字填充每行。我確信這是一個簡單的問題,我只是努力讓我的頭腦思考SQL。

回答

0

嗯,這似乎工作

MERGE INTO KEYWORDS masterList 
USING (
    SELECT COUNT(*) AS OCCURRENCE,KEYWORDS.KEYWORD AS KEYWORD FROM 
    KEYWORDS AS keywordList 
    CROSS JOIN PHRASES AS phraseList 
    WHERE (dbo.RegExIsMatch('.*' + keywordList.KEYWORD + '.*',phraseList.PHRASE,1) = 1) 
    GROUP BY KEYWORD 
) frequencyList 
ON (masterList.KEYWORD = frequencyList.KEYWORD) 
WHEN MATCHED THEN 
    UPDATE SET masterList.OCCURRENCE = frequencyList.OCCURRENCE; 
+0

這樣做沒有將'KEYWORD' CHEESE的'OCCURRENCE'的值設置爲0. – ErikE

0

您的查詢有三個不同的表格,但問題只有兩個。你是這個意思嗎?

UPDATE Keywords 
    SET OCCURRENCE = (SELECT SUM(CASE WHEN dbo.RegExIsMatch('.*' + KEYWORDS.KEYWORD + '.*',PHRASES.PHRASE,1) = 1 
            THEN 1 ELSE 0 
           END) 
        FROM PHRASES 
        ); 

否則,如果您有三個表,則需要將子查詢與外部表相關聯。

+0

修改後的代碼是啊,這是個錯誤,我試圖簡化這個問題的原代碼,但錯過了一個.. – arman

+0

您可以通過將條件移到WHERE子句並計數而不是求和來簡化這一點:'SET OCCURRENCE =(SELECT Count(*)FROM Phrases WHERE dbo.RegExIsMatch(...)= 1)' – ErikE

-1

嘗試這種方法的從我身邊工作

-------------表創建

declare @PHRASE table (ID int,PHRASE varchar(max)) 
    insert into @PHRASE 
    select 0,'"Hello Bye Yes"' 
    union all 
    select 1,'"No Why Not"' 
    union all 
    select 2,'"No Yes"' 
    select * from @PHRASE 
    declare @Keywords table (KEYWORD varchar(10),OCCURANCE int) 
    insert into @Keywords 
    select 'YES',null 
    union all 
    select 'NO',null 
    union all 
    select 'HELLO',null 
    union all 
    select 'CHEESE',null 
    select * from @Keywords 

----------Script for requirement 

create table #table (name varchar(max),) 

DECLARE @str VARCHAR(25) 

DECLARE curs_Fp CURSOR FOR 

SELECT c.PHRASE FROM @PHRASE c 

OPEN curs_Fp 
FETCH NEXT FROM curs_Fp INTO @str 

    WHILE @@FETCH_STATUS = 0 
BEGIN 

     while patindex('%["]%',@str) > 0 
     SET @str = REPLACE(@str, SUBSTRING(@str, patindex('%["]%',@str), 1),'') 

          set @str = @str+' ' 
          WHILE CHARINDEX(' ', @str) > 0 
          BEGIN 

           DECLARE @tmpstr VARCHAR(50) 
           SET @tmpstr = SUBSTRING(@str, 1, (CHARINDEX(' ', @str) - 1)) 

           insert into #table (name) select @tmpstr 

           SET @str = SUBSTRING(@str, CHARINDEX(' ', @str) + 1, LEN(@str)) 
          END 

FETCH NEXT FROM curs_Fp INTO @str 
END 

CLOSE curs_Fp 
DEALLOCATE curs_Fp 

update y 
set y.OCCURANCE = isnull(x.occurance,0) 
from 
@Keywords y 
left join 
--#table x on y.keyword = x.name 
(select a.name,count(a.name) occurance from #table a group by a.name) x on y.KEYWORD = x.name 
select * from @Keywords 
drop table #table 
0

因爲我沒有你的函數dbo.RegExIsMatch進行測試,所以我想出了一個稍微不同的例子,只使用sqlserver-out-of-the-box-stuff。

您可能確實在每個地方都得到了1的計數,因爲您使用的是SUM而不是GROUP BY

請注意,這是不是100%準確的,因爲我沒有使用正則表達式,但只是「簡單的愚蠢」的字符串函數,但是如果你要修改你的正則表達式功能做一個正則表達式替換你可以代替我的電話到REPLACE與那,並會給你正確的結果。

fiddle demo

其他小的改動是爲所有關鍵字設定的0代替NULL初始值。

另請注意,我不再做CROSS JOIN,而是與包含該詞的短語進行連接,這樣出現不會被多次覆蓋,這也正是我所想的情況。

INSERT INTO KEYWORDS (KEYWORD, OCCURRENCE) 
    SELECT 'YES', 0 
    UNION 
    SELECT 'NO', 0 
    UNION 
    SELECT 'HELLO', 0 
    UNION 
    SELECT 'CHEESE', 0; 

UPDATE KEYWORDS SET KEYWORDS.OCCURRENCE = KEYWORDS.OCCURRENCE + 
    (LEN(PHRASES.PHRASE) - LEN(REPLACE(PHRASES.PHRASE, KEYWORDS.KEYWORD, '')))/LEN(KEYWORDS.KEYWORD) 
    FROM KEYWORDS 
    INNER JOIN PHRASES ON CHARINDEX(KEYWORDS.KEYWORD, PHRASES.PHRASE) > 0; 

PS:對於simple stupid串計數我用稍微從這個answer(including the comment)

+0

我認爲他不需要統計每個短語中單詞出現的次數(即「YES YES YES「只會計算一次,而不是3次),這樣一個簡單的LIKE表達式就可以完成這項工作。 – ErikE

相關問題