我想通過在關係數據庫(ROLAP)中使用星型模式來構建一個簡單的多維數據模型。爲此,我創建了一個事實表和兩個維度表。首先我複製操作源的數據並處理這些數據(一些簡化的ETL過程)。這個多維模型有什麼不對?
在我的模型中只有兩個維度:date
和status
。度量:某些狀態的數量(一段時間)。
的時間維度表:
CREATE TABLE [dbo].[tbl_date_dim] (
[ID][int] IDENTITY(1,1) NOT NULL,
[date_key][int] NOT NULL primary key,
[Year][int] NOT NULL,
[Month][int] NOT NULL,
[Day][int] NOT NULL
);
有一個表 - tbl_application
- 其中存儲的全部時間範圍(場VersionDate
)。因此,時間維度表,我填補這一方式:
INSERT INTO [dbo].[tbl_date_dim]
([date_key],
[Year],
[Month],
[Day])
(
SELECT DISTINCT
CAST(YEAR(VersionDate) as VARCHAR(4)) +
RIGHT('00' + CAST(MONTH(VersionDate) as VARCHAR(2)) ,2) +
RIGHT('00' + CAST(DAY(VersionDate) as VARCHAR(2)), 2) as 'date_key',
YEAR(inner_data.VersionDate) as 'Year',
MONTH(inner_data.VersionDate) as 'Month',
DAY(inner_data.VersionDate) as 'Day'
FROM (
SELECT
VersionDate
FROM [dbo].[tbl_application]
) AS inner_data
);
狀態維度表:我用整個現有的表tbl_applicationstatus
。
接下來,我創建了一個事實表。它包含用於維度表和度量的外鍵。
CREATE TABLE [dbo].[tbl_olap_fact] (
[ID][int] IDENTITY(1,1) NOT NULL,
[status_id][int] NOT NULL, // FK
[date_dim][int] NOT NULL, // FK
[staus_name] varchar(100) NOT NULL, // Non additive measure
[transaction_id][int] NOT NULL, // Additive measure
CONSTRAINT [PK_tbl_olap_fact] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];
transaction_id
- 這個字段,我將聚合(狀態數)。
接下來,我添加了事實表和維表之間的關係:
ALTER TABLE [dbo].[tbl_olap_fact] ADD CONSTRAINT [FK_tbl_olap_fact_tbl_date_dim] FOREIGN KEY([date_dim])
REFERENCES [dbo].[tbl_date_dim] ([date_key]);
ALTER TABLE [dbo].[tbl_olap_fact] ADD CONSTRAINT [FK_tbl_olap_fact_tbl_applicationstatus] FOREIGN KEY([status_id])
REFERENCES [dbo].[tbl_applicationstatus] ([ID]);
然後我填的是事實表:
INSERT INTO [dbo].[tbl_olap_fact]
([transaction_id],
[status_id],
[staus_name],
[date_dim])
(
SELECT DISTINCT
core.id as 'transaction_id',
core_status.ID as 'status_id',
core_status.name as 'status_name',
CAST(YEAR(core.VersionDate) as VARCHAR(4)) +
RIGHT('00' + CAST(MONTH(core.VersionDate) as VARCHAR(2)) ,2) +
RIGHT('00' + CAST(DAY(core.VersionDate) as VARCHAR(2)), 2) as 'date_dim'
FROM
[dbo].[tbl_application] as core
inner join tbl_applicationstatus as core_status
on core.ApplicationStatusID = core_status.ID
WHERE IsRaw = 0
);
由於我使用的是蒙德里安的OLAP服務器。定義多維數據庫的邏輯模型的Mondrian模式:
<Schema name="olap_schema">
<Dimension type="TimeDimension" visible="true" highCardinality="false" name="Date first dim">
<Hierarchy name="date_hierarchy" visible="true" hasAll="true" primaryKey="date_key" description="">
<Table name="tbl_date_dim" schema="dbo">
</Table>
<Level name=""
visible="true"
table="tbl_date_dim"
column="Year"
nameColumn="Year"
type="Numeric"
uniqueMembers="true"
levelType="TimeYears"
hideMemberIf="Never"
description="">
</Level>
<Level name=""
visible="true"
table="tbl_date_dim"
column="Month"
nameColumn="Month"
ordinalColumn="Month"
type="Numeric"
uniqueMembers="false"
levelType="TimeMonths"
hideMemberIf="Never"
description="">
</Level>
<Level name=""
visible="true"
table="tbl_date_dim"
column="Day"
nameColumn="Day"
ordinalColumn="Day"
type="Numeric"
uniqueMembers="false"
levelType="TimeDays"
hideMemberIf="Never"
description="">
</Level>
</Hierarchy>
</Dimension>
<Dimension type="TimeDimension" visible="true" highCardinality="false" name="Date second dim">
<Hierarchy name="date_hierarchy" visible="true" hasAll="true" primaryKey="date_key" description="">
<Table name="tbl_date_dim" schema="dbo">
</Table>
<Level name=""
visible="true"
table="tbl_date_dim"
column="Year"
nameColumn="Year"
type="Numeric"
uniqueMembers="true"
levelType="TimeYears"
hideMemberIf="Never"
description="">
</Level>
<Level name=""
visible="true"
table="tbl_date_dim"
column="Month"
nameColumn="Month"
ordinalColumn="Month"
type="Numeric"
uniqueMembers="false"
levelType="TimeMonths"
hideMemberIf="Never"
description="">
</Level>
<Level name=""
visible="true"
table="tbl_date_dim"
column="Day"
nameColumn="Day"
ordinalColumn="Day"
type="Numeric"
uniqueMembers="false"
levelType="TimeDays"
hideMemberIf="Never"
description="">
</Level>
</Hierarchy>
</Dimension>
<Dimension type="StandardDimension" visible="true" highCardinality="false" name="Status dimension">
<Hierarchy name="status_hierarchy" visible="true" hasAll="true" primaryKey="ID" description="">
<Table name="tbl_applicationstatus" schema="dbo">
</Table>
<Level name=""
visible="true"
table="tbl_applicationstatus"
column="Name"
nameColumn="Name"
type="String"
uniqueMembers="true"
levelType="Regular"
hideMemberIf="Never"
description="">
</Level>
</Hierarchy>
</Dimension>
<Cube name="enrollment_cube" caption="" visible="true" description="" cache="true" enabled="true">
<Table name="tbl_olap_fact" schema="dbo">
</Table>
<DimensionUsage source="Date first dim" name="X axis" caption="" visible="true" foreignKey="date_dim" highCardinality="false">
</DimensionUsage>
<DimensionUsage source="Date second dim" name="Y axis" caption="" visible="true" foreignKey="date_dim" highCardinality="false">
</DimensionUsage>
<DimensionUsage source="Status dimension" name="Z axis" caption="" visible="true" foreignKey="status_id" highCardinality="false">
</DimensionUsage>
<Measure name="TotalCount" column="transaction_id" aggregator="count" caption="Total" visible="true">
</Measure>
</Cube>
</Schema>
作爲使用Saiku Analytics的OLAP客戶端。
基本上,我得到正確的數據 - 但它不太清楚。例如,我用來填充事實表的方式是否正確?我是否正確構建ETL過程?這是一個測試模式,我在構建數據倉庫和多維模型方面做了一些實驗。
我將非常感激這些信息。謝謝大家。
我很無語!非常感謝你這樣詳細的回答!我會仔細研究......非常感謝你分享你的經驗!這對我來說非常重要。 –