緩慢的TSQL查詢

關於如何提高查詢性能的任何想法？緩慢的TSQL查詢

[ftsIndex] PK是sID，wordPos。
並且在wordID，sID，wordPos上有一個索引。
它們都是int。

最後使用一個獨特的。
大多數sID只有幾個匹配。
某些sID可能有10,000個以上的匹配並且終止查詢。

查詢第一個27,749行在11秒內返回的位置。
沒有一個sID有超過500個匹配。
個人比賽的總和是65,615。

27,750行本身需要2分鐘以上，並有15,000場比賽。

因爲最後加入的是[sID]，這並不令人意外。

由於最終使用的不同，有沒有辦法把它找第一肯定

on [wXright].[sID] = [wXleft].[sID] 
    and [wXright].[wordPos] > [wXleft].[wordPos] 
    and [wXright].[wordPos] <= [wXleft].[wordPos] + 10

然後移動到下一個SID？

我知道這是從查詢優化器問很多，但這真的很酷。

在現實生活中，問題文檔是零件清單和供應商重複多次。

select distinct [wXleft].[sID] 
FROM 
(-- begin [wXleft] 
    (-- start term 
     select [ftsIndex].[sID], [ftsIndex].[wordPos] 
     from [ftsIndex] with (nolock) 
     where [ftsIndex].[wordID] in 
       (select [id] from [FTSwordDef] with (nolock) 
          where [word] like 'Brown') 
    ) -- end term 
) [wXleft] 
join 
(-- begin [wRight] 
    (-- start term 
     select [ftsIndex].[sID], [ftsIndex].[wordPos] 
     from [ftsIndex] with (nolock) 
     where [ftsIndex].[wordID] in 
       (select [id] from [FTSwordDef] with (nolock) 
          where [word] like 'Fox') 
    ) -- end term 
) [wXright] 
on [wXright].[sID] = [wXleft].[sID] 
and [wXright].[wordPos] > [wXleft].[wordPos] 
and [wXright].[wordPos] <= [wXleft].[wordPos] + 10

這使得它歸結爲1:40

inner loop join

我這樣做只是嘗試，它完全改變了查詢計劃。
我不知道問題查詢需要多長時間。我在20點放棄了。
我甚至不會將此作爲答案張貼，因爲我沒有看到它對任何人都有價值。
希望得到更好的答案。
如果在接下來的兩天內我沒有收到，我會刪除這個問題。

這不能解決問題

select distinct [ft1].[sID] 
    from [ftsIndex] as [ft1] with (nolock) 
    join [ftsIndex] as [ft2] with (nolock) 
    on [ft2].[sID] = [ft1].[sID] 
    and [ft1].[wordID] in (select [id] from [FTSwordDef] with (nolock) where [word] like 'brown') 
    and [ft2].[wordID] in (select [id] from [FTSwordDef] with (nolock) where [word] like 'fox') 
    and [ft2].[wordPos] > [ft1].[wordPos] 
    and [ft2].[wordPos] <= [ft1].[wordPos] + 10

也支持類似「快速的棕色」查詢與「狐狸」或「土狼」用別名，以便加入的10個字是不是一個很好的路徑。

這需要14分鐘（但至少它運行）。
這種格式再次不利於更高級的查詢。

IF OBJECT_ID(N'tempdb..#tempMatch1', N'U') IS NOT NULL DROP TABLE #tempMatch1 
CREATE TABLE #tempMatch1(
    [sID] [int] NOT NULL, 
    [wordPos] [int] NOT NULL, 
CONSTRAINT [PK1] PRIMARY KEY CLUSTERED 
(
    [sID] ASC, 
    [wordPos] ASC 
)) 
IF OBJECT_ID(N'tempdb..#tempMatch2', N'U') IS NOT NULL DROP TABLE #tempMatch2 
CREATE TABLE #tempMatch2(
    [sID] [int] NOT NULL, 
    [wordPos] [int] NOT NULL, 
CONSTRAINT [PK2] PRIMARY KEY CLUSTERED 
(
    [sID] ASC, 
    [wordPos] ASC 
)) 
insert into #tempMatch1 
select [ftsIndex].[sID], [ftsIndex].[wordPos] 
     from [ftsIndex] with (nolock) 
     where [ftsIndex].[wordID] in 
       (select [id] from [FTSwordDef] with (nolock) 
          where [word] like 'Brown') 
     --and [wordPos] < 100000; 
    order by [ftsIndex].[sID], [ftsIndex].[wordPos]      
insert into #tempMatch2 
select [ftsIndex].[sID], [ftsIndex].[wordPos] 
     from [ftsIndex] with (nolock) 
     where [ftsIndex].[wordID] in 
       (select [id] from [FTSwordDef] with (nolock) 
          where [word] like 'Fox') 
     --and [wordPos] < 100000; 
    order by [ftsIndex].[sID], [ftsIndex].[wordPos] 
select count(distinct(#tempMatch1.[sID])) 
from #tempMatch1 
join #tempMatch2 
    on #tempMatch2.[sID] = #tempMatch1.[sID] 
and #tempMatch2.[wordPos] > #tempMatch1.[wordPos] 
and #tempMatch2.[wordPos] <= #tempMatch1.[wordPos] + 10

稍微不同的連接在5秒內運行（並且具有不同的查詢計劃）。
但我無法修復它的提示，因爲它移動的地方它加入。
即使+1有超過10個文件，有超過7000比賽。

on [wXright].[sID] = [wXleft].[sID] 
and [wXright].[wordPos] = [wXleft].[wordPos] + 1

全表DEF

CREATE TABLE [dbo].[FTSindex](
    [sID] [int] NOT NULL, 
    [wordPos] [int] NOT NULL, 
    [wordID] [int] NOT NULL, 
    [charPos] [int] NOT NULL, 
CONSTRAINT [PK_FTSindex] PRIMARY KEY CLUSTERED 
(
    [sID] ASC, 
    [wordPos] ASC 
)WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100) ON [PRIMARY] 
) ON [PRIMARY] 

GO 

ALTER TABLE [dbo].[FTSindex] WITH CHECK ADD CONSTRAINT [FK_FTSindex_FTSwordDef] FOREIGN KEY([wordID]) 
REFERENCES [dbo].[FTSwordDef] ([ID]) 
GO 

ALTER TABLE [dbo].[FTSindex] CHECK CONSTRAINT [FK_FTSindex_FTSwordDef] 
GO

來源

2013-05-29 Paparazzi

我不知道你的所有數據，但你有沒有想過可能插入臨時表，然後創建聚簇索引呢？先插入，然後創建索引。這通常比創建索引本身更快。這可能對你有所幫助，可能不會，所以我想把它作爲評論添加。 – djangojazz

@djangojazz插入只需要5秒鐘。如果我添加一個排序，所以記錄按PK順序插入，它仍然是5秒。 – Paparazzi

我們將需要表/鍵/索引定義和查詢計劃（實際）。此外，這種設計/方法是否有任何理由，與僅使用SQL Server全文搜索相反？ – RBarryYoung

UPDATE：

，您仍然可以使用union all這有助於優化從指數挽留訂購如果延誤過濾 'L' 和 'R' 兩邊直至過程的最後部分。不幸的是，你需要事先檢索所有的wordids，並在equals的條件下使用它們。在我的機器上，它將執行時間縮短到2/3：

; with o as (
    select sID, wordPos, wordID 
     from FTSindex 
    where wordID = 1 
    union all 
    select sID, wordPos, wordID 
     from FTSindex 
    where wordID = 4 
    union all 
    select sID, wordPos, wordID 
     from FTSindex 
    where wordID = 2 
), 
g as (
    select sID, wordPos, wordID, 
      ROW_NUMBER() over (partition by [sID] order by wordPos) rn 
     from o 
) 
select count(distinct(g1.sID)) -- 26919 00:02 
     from g g1 
     join g g2 
     on g1.sID = g2.sID 
     and g1.rn = g2.rn - 1 
     and g1.wordPos >= g2.wordPos - 10 
    -- Now is the time to repartition the stream 
     and g1.wordID in (1, 4) 
     and g2.wordID = 2

哦，現在真的需要兩秒嗎？

UPDATE - 2：

; with o as (
-- Union all resolves costly sort 
    select sid, wordpos, wordid 
     from FTSindex 
    where wordID = 1 
    union all 
    select sid, wordpos, wordID 
     from FTSindex 
    where wordID = 2 
), 
g as (
    select sid, wordid, wordpos, 
      ROW_NUMBER() over(order by sid, wordpos) rn 
     from o 
) 
select count(distinct g1.sid) 
    from g g1 
inner join g g2 
    on g1.sID = g2.sID 
    and g1.rn = g2.rn - 1 
where g1.wordID = 1 
    and g2.wordID = 2 
    and g1.wordPos >= g2.wordpos - 10

1和2支架對選定的詞標識。結果不同於原始查詢產生的10個詞以內的多個匹配結果;原始查詢將報告所有這些，但是這個只會顯示最接近的一個。

這個想法是隻提取搜索到的單詞並比較兩個相鄰單詞之間的距離，其中wordID 1先到達，wordID 2秒。

更新 - 1：

我記下了這個職位，因爲它沒有執行，以及我已經想好。但是，它滿足OP的需要比優化查詢更好，因爲它允許同時搜索多個單詞（在可能在where子句中指定的另一個單詞附近發現的單詞列表）。

; with g as (
    select sid, wordid, wordpos, 
      ROW_NUMBER() over(order by sid, wordpos) rn 
     from FTSindex 
    where wordID in (1, 2) 
) 
select count(distinct g1.sid) 
    from g g1 
inner join g g2 
    on g1.sID = g2.sID 
    and g1.rn = g2.rn - 1 
where g1.wordID = 1 
    and g2.wordID = 2 
    and g1.wordPos >= g2.wordpos - 10

第一次嘗試：

有可能是聯合使用cross apply與top 1的方式。

select [wXleft].[sID], [wXleft].[wordPos] 
    from [ftsIndex] wXleft with (nolock) 
cross apply 
(
    select top 1 r.sID 
     from [ftsIndex] r 
    where r.sID = wXleft.sID 
     and r.wordPos > wxLeft.wordPos 
     and r.wordPos <= wxLeft.wordPos + 10 
     and r.wordID in 
      (select [id] 
       from [FTSwordDef] with (nolock) 
      where [word] like 'Fox') 
) wXright 
where [wXleft].[wordID] in 
     (select [id] 
      from [FTSwordDef] with (nolock) 
     where [word] like 'Brown')

BONUS PIVOT嘗試：

來源

2013-05-29 23:45:42

返回與內部循環連接相同的答案2/3的時間，在接受這個之前會等待幾天的奇蹟答案，謝謝 – Paparazzi

爲什麼你會採取另一種方式呢？這樣會更快一些，我一直在嘗試調整它來試圖讓它多出一點。奇怪的是，CTE出現的是成本的主宰 – Paparazzi

@Blam因爲我的時機錯了，它花了我最初的嘗試，同時我已經解決了排序部分但是我很擔心Sql Server需要爲每個引用執行一次CTE，並且有兩個引用。我會在一分鐘後發佈新版本。 –

好了，我希望我有更多的信息或方法來測試，但做不到這一點，這就是我可能會嘗試：

IF OBJECT_ID(N'tempdb..#tempMatch', N'U') IS NOT NULL DROP TABLE #tempMatch 
CREATE TABLE #tempMatch(
    [sID] [int] NOT NULL, 
    [wordPos] [int] NOT NULL, 
    [wordID] [int] NOT NULL, 
CONSTRAINT [PK2] PRIMARY KEY CLUSTERED 
(
    [sID] ASC, 
    [wordPos] ASC 
)) 

-- 
;WITH cteWords As 
(
      SELECT 'Brown' as [word] 
    UNION ALL SELECT 'Fox' 
) 
INSERT INTO #tempMatch ([sID],[wordPos],[wordID]) 
SELECT sID, wordPos, wordID 
FROM ftsIndex 
WHERE EXISTS 
     (Select * From FTSWordDef s1 
     inner join cteWords s2 ON s1.word = s2.word 
     Where ftsIndex.wordID = s1.id) 
; 

select count(distinct(s1.[sID])) 
    from #tempMatch s1 
    join #tempMatch s2 
     on s2.[sID] = s1.[sID] 
     and s2.[wordPos] > s1.[wordPos] 
     and s2.[wordPos] <= s1.[wordPos] + 10 
    where s1.wordID = (select id from FTSWordDef w where w.word = 'Brown') 
     and s2.wordID = (select id from FTSWordDef w where w.word = 'Fox')

昨晚我想出了一個替代版本。這是相同的查詢同上，但CREATE語句改爲：

IF OBJECT_ID(N'tempdb..#tempMatch', N'U') IS NOT NULL DROP TABLE #tempMatch 
CREATE TABLE #tempMatch(
    [sID] [int] NOT NULL, 
    [wordID] [int] NOT NULL, 
    [wordPos] [int] NOT NULL, 
CONSTRAINT [PK0] PRIMARY KEY CLUSTERED 
(
    [wordID] ASC, 
    [sID] ASC, 
    [wordPos] ASC 
))

請讓我知道，如果這些幫助都沒有。

來源

2013-05-30 21:19:34 RBarryYoung

必須在第一個約束上添加wordID，並且都會在聯接cteWords上引發錯誤。 – Paparazzi

@Blam什麼是錯誤？我無法測試編譯，因爲我們沒有表定義。 – RBarryYoung

@Blam爲什麼你必須添加wordID到第一個約束？根據你的帖子，'（sID，wordPos）'應該是足夠的，因爲它們是我在'INSERT..SELECT..'中繪製的唯一表的主鍵。（事實上，現在我看到了，我意識到'DISTINCT'是多餘的，不應該在那裏） – RBarryYoung

緩慢的TSQL查詢

回答

相關問題