優化sql函數以獲取通用元素

我有一個函數，它帶有兩個分隔字符串並返回常用元素的數量。所述優化sql函數以獲取通用元素

函數的主要代碼（@intCount是預期的返回值）

SET @commonCount = (select count(*) from (
    select token from dbo.splitString(@userKeywords, ';') 
    intersect 
    select token from dbo.splitString(@itemKeywords, ';')) as total)

其中splitString使用while循環和的charIndex到字符串分割成分隔的標記，並將其插入到表中。

我遇到的問題是，這隻能以每秒約100行的速度和我的數據集的大小處理，這將需要大約8-10天才能完成。

兩個字符串的長度可能長達1500個字符。

無論如何，我可以做到這一點足夠快，可用嗎？

來源

2012-05-13 randomThought

這是你需要一直運行的東西，還是一次性的努力？ – dasblinkenlight

我正在運行一些模擬數據挖掘，所以需要做到這一點，只要我的模型改變或如果我想試驗新的公式。可能不是很頻繁 – randomThought

性能問題可能是遊標（用於while循環）和用戶定義的函數的組合。

如果這些字符串之一是恆定的（如項目關鍵詞），您可以搜索每一個獨立的爲：

select * 
from users u 
where charindex(';'+<item1>+';', ';'+u.keywords) > 0 
union all 
select * 
from users u 
where charindex(';'+<item2>+';', ';'+u.keywords) > 0 union all

另外，一組爲基礎的方法可以工作，但你必須標準化數據（在這裏插入數據以正確格式開始）。也就是說，你想有一個表：

userid 
keyword

，另一個具有

itemid 
keyword

（如果有不同類型的項目，否則這只是一個關鍵字列表。）

然後您的查詢將如下所示：

select * 
from userkeyword uk join 
    itemkeyword ik 
    on uk.keyword = ik.keyword

而SQL引擎會執行它的魔法。

現在，你如何創建這樣一個列表？如果只有每個用戶的關鍵詞了一把，那麼你可以這樣做：

with keyword1 as (select u.*, charindex(';', keywords) as pos1, 
         left(keywords, charindex(';', keywords)-1) as keyword1 
        from user u 
        where charindex(';', keywords) > 0 
       ), 
    keyword2 as (select u.*, charindex(';', keywords, pos1+1) as pos2, 
         left(keywords, charindex(';', keywords)-1, pos1+1) as keyword2 
        from user u 
        where charindex(';', keywords, pos1+2) > 0 
       ), 
     ... 
select userid, keyword1 
from keyword1 
union all 
select userid, keyword2 
from keyword2 
...

要獲得元素在itemKeyWords的最大數量，你可以使用下面的查詢：

select max(len(Keywords) - len(replace(Keywords, ';', ''))) 
from user

來源

2012-05-13 15:33:16

我在考慮以表格爲基礎的方法。我得到的數據是以我將它們上載到表格的格式存儲在平面文件中的。他們werent規範化，並且大小約爲2-3演出 – randomThought

我會使用powershell來分割數據，然後以標準化格式加載它。如果您已經有了表格中的數據，請嘗試使用電子郵件中的方法。它可能比你期望的更好，特別是如果你在多處理器機器上運行的話。您的原始方法可能是序列化查詢，因此它沒有充分利用您的所有硬件。 –

優化sql函數以獲取通用元素

回答

相關問題