2013-07-22 28 views
1

那裏有很好的SQL PageRank實現嗎?我看過http://www.databasedevelop.com/197517/,但它缺少可讀性和正確性(T-SQL)語法。SQL PageRank實現

雖然我們在這,但有人知道上面的鏈接正在使用什麼樣的SQL?什麼SQL使用'是'在隨機的地方,'哪裏'什麼都沒有,奇怪的AT關鍵字,等等?

+0

鏈接的T-SQL是可怕的。使用遊標不是正確的工具。 PageRank完美地映射到map-reduce風格的SQL查詢。 – usr

+0

請問map-reduce樣式是什麼? – NargothBond

+2

Map-reduce是「GROUP BY」查詢的現代流行詞。谷歌的「頁面排名減少」瞭解更多。 – usr

回答

0

根據您的SQL Server版本,可以查看OFFSET_FETCH窗口函數。這裏有很多頁面排名的應用程序。當然,這將需要2012年。

我也用SSIS和一個NTILE()分割臨時表來完成分頁的能力,使用OFFSET_FETCH的缺席。通常使用類似記錄計數除以我想在頁面中看到的最大數量作爲NTILE調用的種子。

無論出於何種原因,我甚至無法打開您的鏈接,所以希望這是您要求的。

MSDN - OFFSET_FETCH

MSDN - NTILE

0

我只是在SQL中實現的PageRank算法。該算法在以下進展中起作用。

1.計算PageRank初始等級值;

2.Joining 的PageRank邊緣表以發射秩值至鄰接的節點,並且使用聚集函數總和爲「收集」接收到的值。然後,將結果保存到寺廟表TmpRank;

  • 交換的PageRank的內容TmpRank,並轉到步驟2,直到收斂條件被滿足或達到最大重複時間。
  • 下面是代碼:

    -- The graph data and algorithm source from the book "Mining of Massive Datasets", P175, http://infolab.stanford.edu/~ullman/mmds/book.pdf 
    -- This script has been verified the correctness in SQL Server 2017 Linux Version. 
    DROP TABLE Node; 
    DROP TABLE Edge; 
    DROP TABLE OutDegree; 
    DROP TABLE PageRank; 
    CREATE TABLE Node(id int PRIMARY KEY); 
    CREATE TABLE Edge(src int,dst int, PRIMARY KEY (src, dst)); 
    CREATE TABLE OutDegree(id int PRIMARY KEY, degree int); 
    CREATE TABLE PageRank(id int PRIMARY KEY, rank float); 
    CREATE TABLE TmpRank(id int PRIMARY KEY, rank float); 
    
    --delete all records 
    DELETE FROM Node; 
    DELETE FROM Edge; 
    DELETE FROM OutDegree; 
    DELETE FROM PageRank; 
    DELETE FROM TmpRank; 
    
    --init basic tables 
    INSERT INTO Node VALUES (0); 
    INSERT INTO Node VALUES (1); 
    INSERT INTO Node VALUES (2); 
    INSERT INTO Node VALUES (3); 
    
    INSERT INTO Edge VALUES (0, 1); 
    INSERT INTO Edge VALUES (0, 2); 
    INSERT INTO Edge VALUES (0, 3); 
    INSERT INTO Edge VALUES (1, 0); 
    INSERT INTO Edge VALUES (1, 3); 
    INSERT INTO Edge VALUES (2, 2); 
    INSERT INTO Edge VALUES (3, 1); 
    INSERT INTO Edge VALUES (3, 2); 
    
    --compute out-degree 
    INSERT INTO OutDegree 
    SELECT Node.id, COUNT(Edge.src) --Count(Edge.src) instead of Count(*) for count no out-degree edge 
    FROM Node LEFT OUTER JOIN Edge 
    ON Node.id = Edge.src 
    GROUP BY Node.id; 
    
    --WARN: There's no special process for node with out-degree, This may cause wrong result 
    --  Please to make sure every node in graph has out-degree 
    
    DECLARE @ALPHA float = 0.8; 
    DECLARE @Node_Num int; 
    SELECT @Node_Num = COUNT(*) FROM Node; 
    
    --PageRank Init Value 
    INSERT INTO PageRank 
    SELECT Node.id, rank = (1 - @ALPHA)/@Node_Num 
    FROM Node INNER JOIN OutDegree 
    ON Node.id = OutDegree.id 
    
    /* 
    --For Debugging 
    SELECT * FROM Node; 
    SELECT * FROM Edge; 
    SELECT * FROM OutDegree; 
    SELECT * FROM PageRank; 
    SELECT * FROM TmpRank; 
    */ 
    
    DECLARE @Iteration int = 0; 
    
    WHILE @Iteration < 50 
    BEGIN 
    --Iteration Style 
        SET @Iteration = @Iteration + 1 
    
        INSERT INTO TmpRank 
        SELECT Edge.dst, rank = SUM(@ALPHA * PageRank.rank/OutDegree.degree) + (1 - @ALPHA)/@Node_Num 
        FROM PageRank 
        INNER JOIN Edge ON PageRank.id = Edge.src 
        INNER JOIN OutDegree ON PageRank.id = OutDegree.id 
        GROUP BY Edge.dst 
    
        DELETE FROM PageRank; 
        INSERT INTO PageRank 
        SELECT * FROM TmpRank; 
        DELETE FROM TmpRank; 
    END 
    
    SELECT * FROM PageRank;