2011-06-24 54 views
4

我有一個包含時間序列數據的大型數據庫(5000萬行)。 [datetime]列上有一個聚集索引,可確保該表總是按時間順序排序。使用遊標從SQL Server使用C#讀取時間序列數據?

什麼是表中的行逐行讀出到C#應用程序的最高性能的方式?

+3

「有上的[日期時間]列聚簇索引,其確保該表總是排序」 - 它可確保在相同的順序要求的行是*相對*便宜的操作。它不保證行存儲的實際順序,也不保證行的檢索順序,沒有'ORDER BY'子句。 –

+3

另外,您是否肯定在C#應用程序中消耗5000萬行,一次一行,是處理此數據的最佳方式? –

+1

此外,你會難以嘗試瞭解它的表現嗎?只使用一個簡單的查詢,如'SELECT * FROM MyTable ORDER BY clustered_index_column' –

回答

2

你應該試試這個,並找出答案。我剛剛做了,沒有看到性能問題。

USE [master] 
GO 
/****** Object: Database [HugeDatabase] Script Date: 06/27/2011 13:27:50 ******/ 
CREATE DATABASE [HugeDatabase] ON PRIMARY 
(NAME = N'HugeDatabase', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.SQL2K8R2\MSSQL\DATA\HugeDatabase.mdf' , SIZE = 1940736KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB) 
LOG ON 
(NAME = N'HugeDatabase_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.SQL2K8R2\MSSQL\DATA\HugeDatabase_log.LDF' , SIZE = 395392KB , MAXSIZE = 2048GB , FILEGROWTH = 10%) 
GO 

USE [HugeDatabase] 
GO 
/****** Object: Table [dbo].[HugeTable] Script Date: 06/27/2011 13:27:53 ******/ 
SET ANSI_NULLS ON 
GO 
SET QUOTED_IDENTIFIER ON 
GO 
CREATE TABLE [dbo].[HugeTable](
    [ID] [int] IDENTITY(1,1) NOT NULL, 
    [PointInTime] [datetime] NULL, 
PRIMARY KEY NONCLUSTERED 
(
    [ID] ASC 
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] 
) ON [PRIMARY] 
GO 
CREATE CLUSTERED INDEX [IX_HugeTable_PointInTime] ON [dbo].[HugeTable] 
(
    [PointInTime] ASC 
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] 
GO 

填充:

DECLARE @t datetime 
SET @t = '2011-01-01' 

DECLARE @i int 
SET @i=0 

SET NOCOUNT ON 

WHILE (@i < 50000000) 
BEGIN 
    INSERT INTO HugeTable(PointInTime) VALUES(@t) 
    SET @t = DATEADD(ss, 1, @t) 

    SET @i = @i + 1 
END 

測試:

using System; 
using System.Data.SqlClient; 
using System.Diagnostics; 

namespace ConsoleApplication1 
{ 
    internal class Program 
    { 
     private static void Main() 
     { 
      TimeSpan firstRead = new TimeSpan(); 
      TimeSpan readerOpen = new TimeSpan(); 
      TimeSpan commandOpen = new TimeSpan(); 
      TimeSpan connectionOpen = new TimeSpan(); 
      TimeSpan secondRead = new TimeSpan(); 

      try 
      { 

       Stopwatch sw1 = new Stopwatch(); 
       sw1.Start(); 
       using (
        var conn = 
         new SqlConnection(
          @"Data Source=.\sql2k8r2;Initial Catalog=HugeDatabase;Integrated Security=True")) 
       { 
        conn.Open(); connectionOpen = sw1.Elapsed; 

        using (var cmd = new SqlCommand(
         "SELECT * FROM HugeTable ORDER BY PointInTime", conn)) 
        { 
         commandOpen = sw1.Elapsed; 

         var reader = cmd.ExecuteReader(); readerOpen = sw1.Elapsed; 

         reader.Read(); firstRead = sw1.Elapsed; 
         reader.Read(); secondRead = sw1.Elapsed; 
        } 
       } 
       sw1.Stop(); 
      } 
      catch (Exception e) 
      { 
       Console.WriteLine(e); 
      } 
      finally 
      { 

       Console.WriteLine(
        "Connection: {0}, command: {1}, reader: {2}, read: {3}, second read: {4}", 
        connectionOpen, 
        commandOpen - connectionOpen, 
        readerOpen - commandOpen, 
        firstRead - readerOpen, 
        secondRead - firstRead); 

       Console.Write("Enter to exit: "); 
       Console.ReadLine(); 
      } 
     } 
    } 
} 
+0

哇,真的很好的答案 - 我正在運行示例。 – Contango

+1

@Gravitas:那個填充腳本花了4.5個小時...... –

+0

這個工作非常好。但是,如果除非使用「Top 1000000」來限制從select語句返回的行數,否則如果using()語句提前退出(即,如果用戶按下「cancel」),則會有30秒的間隔。 – Contango

2

我會使用SqlDataReader作爲它的結果流。您仍然需要指定順序,但如果您使用聚集索引ORDER BY它應該是一個(相對)便宜的操作。

using (var db = new SqlConnection(connStr)) { 
    using (var rs = new SqlCommand(someQuery, db).ExecuteReader()) { 
     while (rs.Read()) { 
      // do interesting things! 
     } 
    } 
} 
+0

這適用於小型數據庫,但不適用於大型數據庫。問題是,如果我首先在整個表格上做了選擇,那麼完成它需要30分鐘,此時我可以開始將結果流式傳輸回到我的應用程序。有更快的方法嗎? – Contango

+1

@Gravitas:這會給服務器帶來最少的傷害。另一種方法(理論上來說)就是用你需要處理的信息填充一個DataSet。這*可以是一個更快的操作,但考慮到桌子的大小,我認爲在實踐中它會窒息而死。 – Yuck

+0

也許使用遊標填充服務器上的臨時表,然後使用SqlDataReader傳輸臨時表? – Contango

相關問題