Fabric: SCD Type 2 Using Surrogate Keys

by HubSite 365 about Pragmatic Works

Data Analytics Microsoft Fabric Learning Selection

Optimize Kimball SCD Type Two in Microsoft Fabric Warehouse with surrogate keys via Azure SQL CDC and Copy Job

Key insights

SCD Type 2 with surrogate keys: Manuel Quintana demos a Kimball-style SCD Type 2 implemented in Microsoft Fabric Warehouse to preserve history and give each dimension version a stable surrogate identifier.
Source setup: the demo uses an Azure SQL source with CDC enabled and the same business key (CustomerAltID), so source detection and change capture remain unchanged.
Pre-create dimension: create the Warehouse dimension table in advance with an identity column for the surrogate key plus SCD2 tracking fields like FromDate, ToDate and IsCurrent before loading data.
Copy Job configuration: choose Azure SQL as source, Warehouse as destination, set the load to Incremental and pick the SCD2 write method to generate new versions automatically on changes.
Critical gotcha: do not use Edit Column Mappings for this load because the destination contains extra SCD2 tracking and surrogate key columns that the source does not provide.
Results and why it matters: after updates, inserts, and deletes, rerunning the Copy Job creates multiple versions per business key with unique surrogate keys, flags current rows, timestamps expired rows, keeps soft deletes, and ensures correct fact-to-dimension joins for historical reporting.

Pragmatic Works published a clear, hands-on YouTube video that demonstrates how to implement SCD Type 2 in a Microsoft Fabric Warehouse while preserving surrogate keys in the classic Kimball style. In this follow-up demo, presenter Manuel Quintana reuses an Azure SQL source with CDC enabled and a stable business key, then directs the data into a Warehouse destination rather than a Lakehouse. Consequently, the Warehouse approach produces dimension rows with identity-generated surrogate keys and SCD2 tracking fields, which matter when facts must join to the correct historical version. Overall, the video emphasizes practical setup steps, a key operational gotcha, and verification after changes are applied to the source.

Overview of the demo

First, Quintana shows the initial environment: an Azure SQL source table with CDC active and a business key called CustomerAltID, which remains the anchor for matching. Then, he pre-creates a Warehouse dimension table that includes both SCD2 tracking columns and an identity column to act as the surrogate key. After that, he configures a Copy Job to run an incremental load using the SCD Type 2 write method so the Warehouse can manage versions automatically. As a result, the demo highlights the contrast between no-code Lakehouse approaches and the Warehouse pattern that supports Kimball-style surrogate keys.

Source, setup and configuration

Quintana keeps the source consistent with the prior video by leaving CDC enabled, which ensures change events are available for incremental processing. Next, he points out that the Warehouse destination must be pre-created because the table needs tracking fields and an identity column that the source does not supply. In addition, he configures the Copy Job to use the incremental copy plus the SCD2 write method so the pipeline writes new versions instead of overwriting rows. Therefore, the setup requires careful alignment between the source business key, the Warehouse schema, and the Copy Job configuration.

Critical configuration choices and gotchas

Importantly, Quintana calls out a critical gotcha: do not use Edit Column Mappings in the Copy Job when the destination contains columns that aren’t in the source, because mapping changes can break automatic SCD2 behavior. Moreover, if you try to map the identity or SCD2 tracking columns explicitly, the job can misinterpret which columns to manage, and you risk losing the automatic surrogate key behavior. Consequently, the recommended practice is to let the Copy Job manage the destination columns and only map the source business attributes. In short, skipping manual column mapping preserves the job’s ability to create new dimension versions with unique surrogate keys and proper timestamps.

Observed results and validation

To validate the pattern, the presenter performs a sequence of updates, inserts and deletes in the source Azure SQL table and then re-runs the Copy Job to capture the changes. After the second run, the Warehouse contains multiple rows per business key with distinct surrogate keys, rows flagged as current, expired rows closed with timestamps, and deletes retained as soft deletes, which maintains historical integrity. In addition, Quintana inspects the tracking fields and surrogate key values to demonstrate that facts can now point to the correct historical version. Thus, the demo confirms that this approach supports correct fact-to-dimension joins across history.

Tradeoffs and operational challenges

This Warehouse-based SCD2 pattern brings tradeoffs: while it preserves Kimball-style surrogate keys and historical accuracy, it requires more upfront schema design and careful operational controls compared with simpler no-code Lakehouse loads. For example, Fabric Warehouse does not handle identity columns exactly like SQL Server, so engineers must pre-create the table with an identity column and avoid operations that would reset or replace it. In addition, designers must balance the need for stable surrogate keys against the complexity of incremental processing, CDC lag, and potential concurrency during heavy update windows. Consequently, teams must weigh historical accuracy and downstream query correctness against added setup complexity and ongoing maintenance.

Practical recommendations

Based on the video, teams should pre-create Warehouse dimension tables with an identity surrogate key and SCD2 tracking fields, use the incremental Copy Job with the SCD Type 2 write method, and avoid editing column mappings to let Fabric manage versioning. Furthermore, it helps to test with updates, inserts and deletes to validate that soft deletes and expired rows behave as expected, and to confirm that facts match the intended dimension version. Finally, monitor CDC latency, job scheduling, and table schema drift, because these operational factors often determine whether the pattern works reliably in production. Altogether, the Pragmatic Works video provides a compact, actionable walkthrough that clarifies the benefits and tradeoffs of implementing SCD Type 2 with surrogate keys in Microsoft Fabric Warehouse.

Microsoft Fabric - Fabric: SCD Type 2 Using Surrogate Keys

Keywords

SCD Type 2 implementation, Slowly Changing Dimension Type 2, Surrogate keys Kimball pattern, Kimball SCD Type 2 best practices, Data warehouse SCD Type 2, Fabric warehouse surrogate keys, Dimension versioning data warehouse, Implementing SCD Type 2 with surrogate keys

Facebook Instagram X LinkedIn

NetForce 365 GmbH
Bobinethöfe 54
54294 Trier
+49 651 49364480
info@netforce365.com

HubSite 365 Apps