我有一個包含時間序列數據的大型數據庫(5000萬行)。 [datetime]列上有一個聚集索引,可確保該表總是按時間順序排序。使用遊標從SQL Server使用C#讀取時間序列數據?
什麼是表中的行逐行讀出到C#應用程序的最高性能的方式?
我有一個包含時間序列數據的大型數據庫(5000萬行)。 [datetime]列上有一個聚集索引,可確保該表總是按時間順序排序。使用遊標從SQL Server使用C#讀取時間序列數據?
什麼是表中的行逐行讀出到C#應用程序的最高性能的方式?
你應該試試這個,並找出答案。我剛剛做了,沒有看到性能問題。
USE [master]
GO
/****** Object: Database [HugeDatabase] Script Date: 06/27/2011 13:27:50 ******/
CREATE DATABASE [HugeDatabase] ON PRIMARY
(NAME = N'HugeDatabase', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.SQL2K8R2\MSSQL\DATA\HugeDatabase.mdf' , SIZE = 1940736KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB)
LOG ON
(NAME = N'HugeDatabase_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.SQL2K8R2\MSSQL\DATA\HugeDatabase_log.LDF' , SIZE = 395392KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)
GO
USE [HugeDatabase]
GO
/****** Object: Table [dbo].[HugeTable] Script Date: 06/27/2011 13:27:53 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[HugeTable](
[ID] [int] IDENTITY(1,1) NOT NULL,
[PointInTime] [datetime] NULL,
PRIMARY KEY NONCLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE CLUSTERED INDEX [IX_HugeTable_PointInTime] ON [dbo].[HugeTable]
(
[PointInTime] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
填充:
DECLARE @t datetime
SET @t = '2011-01-01'
DECLARE @i int
SET @i=0
SET NOCOUNT ON
WHILE (@i < 50000000)
BEGIN
INSERT INTO HugeTable(PointInTime) VALUES(@t)
SET @t = DATEADD(ss, 1, @t)
SET @i = @i + 1
END
測試:
using System;
using System.Data.SqlClient;
using System.Diagnostics;
namespace ConsoleApplication1
{
internal class Program
{
private static void Main()
{
TimeSpan firstRead = new TimeSpan();
TimeSpan readerOpen = new TimeSpan();
TimeSpan commandOpen = new TimeSpan();
TimeSpan connectionOpen = new TimeSpan();
TimeSpan secondRead = new TimeSpan();
try
{
Stopwatch sw1 = new Stopwatch();
sw1.Start();
using (
var conn =
new SqlConnection(
@"Data Source=.\sql2k8r2;Initial Catalog=HugeDatabase;Integrated Security=True"))
{
conn.Open(); connectionOpen = sw1.Elapsed;
using (var cmd = new SqlCommand(
"SELECT * FROM HugeTable ORDER BY PointInTime", conn))
{
commandOpen = sw1.Elapsed;
var reader = cmd.ExecuteReader(); readerOpen = sw1.Elapsed;
reader.Read(); firstRead = sw1.Elapsed;
reader.Read(); secondRead = sw1.Elapsed;
}
}
sw1.Stop();
}
catch (Exception e)
{
Console.WriteLine(e);
}
finally
{
Console.WriteLine(
"Connection: {0}, command: {1}, reader: {2}, read: {3}, second read: {4}",
connectionOpen,
commandOpen - connectionOpen,
readerOpen - commandOpen,
firstRead - readerOpen,
secondRead - firstRead);
Console.Write("Enter to exit: ");
Console.ReadLine();
}
}
}
}
我會使用SqlDataReader
作爲它的結果流。您仍然需要指定順序,但如果您使用聚集索引ORDER BY
它應該是一個(相對)便宜的操作。
using (var db = new SqlConnection(connStr)) {
using (var rs = new SqlCommand(someQuery, db).ExecuteReader()) {
while (rs.Read()) {
// do interesting things!
}
}
}
「有上的[日期時間]列聚簇索引,其確保該表總是排序」 - 它可確保在相同的順序要求的行是*相對*便宜的操作。它不保證行存儲的實際順序,也不保證行的檢索順序,沒有'ORDER BY'子句。 –
另外,您是否肯定在C#應用程序中消耗5000萬行,一次一行,是處理此數據的最佳方式? –
此外,你會難以嘗試瞭解它的表現嗎?只使用一個簡單的查詢,如'SELECT * FROM MyTable ORDER BY clustered_index_column' –