2016-01-29 80 views
0

下表顯示了課程和學生之間的多對多關係。在多對多關係表中查找完全匹配的組

CREATE Table CourseStudents 
     (
      CourseId INT NOT NULL, 
     StudentId INT NOT NULL, 
     PRIMARY KEY (CourseId, StudentId) 
     ); 

INSERT INTO CourseStudents VALUES (1, 1), (1, 2), (2, 1), (2, 2), (3, 3), (3, 2), 
(4, 3), (4, 2), (5, 1) 

實例數據

| CourseId | StudentId | 
|----------|-----------| 
|  1 |   1 | 
|  1 |   2 | 
|  2 |   1 | 
|  2 |   2 | 
|  3 |   2 | 
|  3 |   3 | 
|  4 |   2 | 
|  4 |   3 | 
|  5 |   1 | 

我在尋找一個返回具有完全相同的學生所有課程的查詢。我能夠拿出下面顯示的查詢。

WITH CourseGroups AS 
(
SELECT c.CourseId, 
STUFF ((
SELECT ',' + CAST(c2.StudentId AS VARCHAR) 
    FROM CourseStudents c2 
    WHERE c2.CourseId = c.CourseId 
    ORDER BY c2.StudentId 
    FOR XML PATH ('')), 1, 1, '') AS StudentList 
FROM CourseStudents c 
GROUP BY c.CourseId) 
SELECT cg.StudentList, 
STUFF ((
SELECT ',' + CAST(cg2.CourseId AS VARCHAR(10)) 
    FROM CourseGroups cg2 
    WHERE cg2.StudentList = cg.StudentList 
    FOR XML PATH ('')), 1, 1, '') AS ExactMatchCourseList 
FROM CourseGroups cg 
GROUP BY cg.StudentList 
HAVING COUNT(*) > 1 

查詢返回

| StudentList | ExactMatchCourseList | 
|-------------|----------------------| 
|   1,2 |     1,2 | 
|   2,3 |     3,4 | 

上述結果是好的。但我只需要ExactMatchCourseList。 我正在處理的表有超過十億行,所以我需要一個高效的查詢,可以在幾分鐘的運行時間內找到任何匹配的課程。感謝任何幫助。 SqlFiddle

回答

0

這不僅會2個運行在你的CourseStudents表,而不是你的4正在做。如果您在CourseStudents表上的CourseId上添加索引,則第一次運行只會是索引掃描。它也只爲每個課程運行一次原始STUFF,而不是每個學生一次,然後按課程分組。我遺漏了最後的東西,我不確定你是否想要它,或者它只是你計算它的副產品。


CREATE TABLE #Course 
(
    CourseId INT NOT NULL PRIMARY KEY 
); 

INSERT INTO #Course 
SELECT CourseId 
FROM 
CourseStudents s 
GROUP BY 
CourseId 
ORDER BY 
CourseId; 

CREATE TABLE #CourseStudentList 
(
CourseId INT NOT NULL PRIMARY KEY, 
StudentList VARCHAR(MAX) NOT NULL 
); 

INSERT INTO #CourseStudentList 
SELECT 
c.CourseId, 
STUFF ((
SELECT ',' + CAST(c2.StudentId AS VARCHAR) 
    FROM CourseStudents c2 
    WHERE c2.CourseId = c.CourseId 
    ORDER BY c2.StudentId 
    FOR XML PATH ('')), 1, 1, '') AS StudentList 
FROM 
#Course c 
ORDER BY 
c.CourseId; 

SELECT * 
FROM 
(
    SELECT 
    l.CourseId, 
    l.StudentList, 
    COUNT(*) OVER (PARTITION BY l.StudentList) AS [Count] 
    FROM 
    #CourseStudentList l 
) l 
WHERE 
l.[Count] > 1 
ORDER BY 
l.StudentList; 
+0

我正在將此標記爲答案,因爲我可以在可接受的時間內檢索重複課程。但是,我不得不修改最後一個查詢以輸出重複課程列表以及學生列表。謝謝。 – ziddarth

0

這會給你一個課程對的列表,但如果你要得到一式三份(或更多),那麼你最終會得到一些額外的結果。我沒有時間去玩弄這進一步糾正這個問題,但也許這點你在正確的方向:

WITH CTE_CourseMatches AS (
    SELECT 
     CS1.CourseId AS CourseId_1, 
     CS2.CourseId AS CourseId_2, 
     COUNT(*) AS cnt 
    FROM 
     CourseStudents CS1 
    INNER JOIN CourseStudents CS2 ON CS2.StudentId = CS1.StudentId AND CS2.CourseId > CS1.CourseId 
    GROUP BY 
     CS1.CourseId, 
     CS2.CourseId 
), 
CTE_CourseCounts AS (SELECT CourseId, COUNT(*) AS cnt FROM CourseStudents GROUP BY CourseID) 
SELECT 
    CM.CourseId_1, 
    CM.CourseId_2 
FROM 
    CTE_CourseMatches CM 
INNER JOIN CTE_CourseCounts CC1 ON CC1.CourseId = CM.CourseId_1 AND CC1.cnt = CM.cnt 
INNER JOIN CTE_CourseCounts CC2 ON CC2.CourseId = CM.CourseId_2 AND CC2.cnt = CM.cnt 
+0

謝謝,會試試看。對於不止一場比賽,結果集不斷增長。但我可以想出一個辦法來處理這個問題。 – ziddarth