Sunday, September 5, 2010

CDC (Change Data Capture)



OVERVIEW

CDC is one of the new data tracking and capturing features of SQL Server 2008. It only tracks changes in user-created tables. Because captured data is then stored in relational tables, it can be easily accessed and retrieved subsequently, using regular T-SQL.


PRE-REQUISITE
  • Works only in Enterprise, Developer and Evaluation Edition
  • SQL Agent Service should be up & running
PRE-CONDITION
  • There should not be any schema with a name "cdc"
  • There should not be any column in a table called __$start_lsn, __$end_lsn, __$seqval, __$operation, and __$update_mask

BEST PRACTICE
  • all cdc changes should be tracked in a separate filegroup rather than using Primary
  • configure only required tables and columns

ENABLE CDC ON DATABASE

EXEC sys.sp_cdc_enable_db
Go
SELECT [name], database_id, is_cdc_enabled FROM sys.databases

This will create some columns under "cdc" schema which will be introduced.


ENABLE CDC ON TABLE

EXEC sys.sp_cdc_enable_table
@source_schema = 'dbo', -- Schema Name
@source_name = 'CDCCheck' , -- Table Name
@role_name = NULL,
@captured_column_list = 'a,b,c', -- Comma separate column list, if not given all columns will be considered
@filegroup_name = 'CDCFileGroup'
A new table will be created under CDC schema called "CDC.dbo_CDCCheck_CT". Apart from columns specified above, it will have 5 extra columns
__$Start_lsn - Log Sequence Number
__$end_lsn - Log Sequence Number
__$sequal
__$operation
  • 1 - Insert
  • 2 - Delete
  • 3 - Before Update
  • 4 - After update
__$update_mask
  • if the value is 0x12 means 0b10010 in binary means 2nd and 5th column has been updated
  • if the value is 0x1F means 0b11111 in binary means all 5 columns have been updated

DISABLE CDC ON TABLE
EXECUTE sys.sp_cdc_disable_table
@source_schema = N'HumanResources',
@source_name = N'Shift'


ADVANTAGES OF CDC
  • CDC talks with transaction log, pulls data from there and put them in cdc table. So it gives best performance in comparison with trigger approach
  • Low development cost, easy maintenance, clean approach of tracking audit.

LIMITATION/PROBLEM WITH CDC



  • TRUNCATE won't be allowed if CDC is enabled on table

  • If a new column gets added or existing column gets removed, it does not reflect CDC table. Alternate solution is, cdc should be disabled and enabled.

  • NVARCHAR(MAX), VARBINARY(MAX) columns will be part of each update operation even though their value is untouched. Ex. In a table, there are 4 columns. Out of this one column in NVARCHAR(MAX). Assume that 1 int column value has been changed but NVARCHAR(MAX) is untouched, still in CDC table, it will be tracked as the previous value was NULL and new value is existing value.



No comments:

Post a Comment