2013-03-12 25 views
0

我有一個名爲Description的列的表。該列填充了文本數據。我想創建一個查詢來返回每個描述中的單詞數量。使用SQL獲取一列的字數

我的想法是創建一個函數,它接受一個值並返回在輸入文本中找到的單詞的數量。

SELECT dbo.GetWordCount(Description) FROM TABLE 

例如,如果說明是「世界,你好!有一個愉快的一天。」,查詢應該返回6.

我怎樣才能說明欄中的字數?

回答

1

看到這個建議的解決方案:http://www.sql-server-helper.com/functions/count-words.aspx

CREATE FUNCTION [dbo].[WordCount] (@InputString VARCHAR(4000)) 
RETURNS INT 
AS 
BEGIN 

DECLARE @Index   INT 
DECLARE @Char   CHAR(1) 
DECLARE @PrevChar  CHAR(1) 
DECLARE @WordCount  INT 

SET @Index = 1 
SET @WordCount = 0 

WHILE @Index <= LEN(@InputString) 
BEGIN 
    SET @Char  = SUBSTRING(@InputString, @Index, 1) 
    SET @PrevChar = CASE WHEN @Index = 1 THEN ' ' 
         ELSE SUBSTRING(@InputString, @Index - 1, 1) 
        END 

    IF @PrevChar = ' ' AND @Char != ' ' 
     SET @WordCount = @WordCount + 1 

    SET @Index = @Index + 1 
END 

RETURN @WordCount 

END 
GO 

用法示例:

DECLARE @String VARCHAR(4000) 
SET @String = 'Health Insurance is an insurance against expenses incurred through illness of the insured.' 

SELECT [dbo].[WordCount] (@String) 
0

這是一個有點麻煩,但它很好地處理空白的問題,它的快速和內聯,沒有UDF。

DECLARE @Term VARCHAR(100) = ' this is pretty fast ' 

SELECT @Term, LEN(REPLACE(REPLACE(REPLACE(' '[email protected],' ',' '+CHAR(1)) ,CHAR(1)+' ',''),CHAR(1),'')) - LEN(REPLACE(REPLACE(REPLACE(REPLACE(' '[email protected],' ',' '+CHAR(1)) ,CHAR(1)+' ',''),CHAR(1),''),' ','')) [Word Count] 
0

除了Mortalus的答案我會使用內聯函數,而不是標量(*注 - 這個功能會從SQL Server 2012及後續工作) 爲SQL Server的早期版本見下圖:

/*SQL Server 2012 and up*/ 
    CREATE FUNCTION dbo.udf_WordCount 
    (

    @str VARCHAR(8000) 

    ) 
    RETURNS TABLE AS RETURN 

    WITH Tally (n) AS 
    (
     SELECT TOP (LEN(@str)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) 
     FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0)) a(n) 
     CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n) 
     CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n) 
     CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n) 
    ) 
    , BreakChar as 
    (
     SELECT SUBSTRING(@str , n , 1) [Char] , N 
     FROM Tally 

    ) 
    , Analize as 
    (
     SELECT * , lag([Char],1) OVER (ORDER BY N) PrevChar 
     FROM BreakChar 
    ) 

     SELECT WordCount = COUNT(1) + 1 
     FROM Analize 
     WHERE [Char] != PrevChar 
     AND PrevChar = ' ' 

如何使用:

DECLARE @str varchar(1000) = 'It''s now or never I ain''t gonna live forever' 
    SELECT * FROM dbo.udf_WordCount(@str) --> 9 

** SQL Server 2008和更低:

/*SQL Server 2008 and down*/ 
    CREATE FUNCTION dbo.udf_WordCount_2008 
    (
    --declare 
    @str VARCHAR(8000) 
    --= 'It''s now or never I ain''t gonna live forever' 
    ) 
    RETURNS TABLE AS RETURN 

    WITH Tally (n) AS 
    (
     SELECT TOP (LEN(@str)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) 
     FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0)) a(n) 
     CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n) 
     CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n) 
     CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n) 
    ) 
    , BreakChar as 
    (
     SELECT SUBSTRING(@str , n , 1) [Char] , N 
     FROM Tally 

    ) 
    , Analize as 
    (
     SELECT a.* , b.Char PrevChar 
     FROM BreakChar a 
     JOIN BreakChar b 
     on a.n = b.n+1 


    ) 

     SELECT WordCount = COUNT(1) + 1 
     FROM Analize 
     WHERE [Char] != PrevChar 
     AND PrevChar = ' ' 
0

廣義語法:

SELECT (LENGTH(column_name) - LENGTH(REPLACE(column_name, ' ', ''))),column_name1,column_name2 FROM table_name; 

在情況下,如果要計算多少單詞表的單一的 '地址' 欄在那裏名爲「employeeDetails」,那麼:

SELECT (LENGTH(address) - LENGTH(REPLACE(address, ' ', ''))),address,employee_name FROM employeeDetails ; 
0

這個答案是基於Mortalus's answer使用相同的代碼,這是我最初發現here

該解決方案是該代碼更高效且更簡潔的版本。我還爲代碼添加了一些解釋,希望能夠爲將來的讀者提供更清晰的答案。


以下user defined function取入的文本的字符串,然後通過所輸入的文本的各字符環路。如果前一個字符是空格,則字數增加1。

由於單詞數是通過計算單詞之間的空格來計算的,所以總是比實際單詞少1個空格。要解決此問題,請啓動@PrevChar,值爲' '。然後,當循環第一次運行時,代碼到達IF @PrevChar = ' '時,它將返回true,並且字數將增加1。即使文本長度爲0,這也可以工作,因爲在這種情況下,它不會通過@Index <= LEN(@InputString)檢查,字數永遠不會增加。 (這取代了鏈接答案中使用的CASE聲明。)

AND @CurrentChar != ' '用於解決雙倍間隔計爲多個單詞的問題。如果前一個字符是空格,但當前字符也是空格,請在不增加字數的情況下繼續下一個索引。接下來的迭代將只有@PrevChar設置爲' ',所以字數只會增加一倍的雙倍空間。

CREATE FUNCTION [dbo].[WordCount] (@InputString VARCHAR(MAX)) 
RETURNS INT 
AS 
BEGIN 
    DECLARE @Index INT = 1 
    DECLARE @CurrentChar CHAR(1) 

    --Initialize the previous character as a space. 
    DECLARE @PrevChar CHAR(1) = ' ' 

    DECLARE @WordCount INT = 0 

    WHILE @Index <= LEN(@InputString) 
    BEGIN 
     --Set the current character to equal the character in the index 
     --position of the inputted text. 
     SET @CurrentChar= SUBSTRING(@InputString, @Index, 1) 

     --If the previous character was a space and the current character 
     --is not a space, increase the wordcount by 1. 
     IF @PrevChar = ' ' AND @CurrentChar != ' ' 
      SET @WordCount = @WordCount + 1 

     --Increase the index counter by 1. 
     SET @Index = @Index + 1 

     --Now that we are done with the current character, set the previous 
     --character to equal the current character. 
     SET @PrevChar = @CurrentChar 
    END 

    RETURN @WordCount 
END