- Dimension where data changes are happening very rapidly
- "Slowly Changing Dimension" blog tells how we can maintain the history of changes
- But if we really implement the same technic for RCD, it will end up creating N number of Type-2 records for same entity and hence we need to come up with a smart design
- We can handle this by creating Junk Dimension
- Let's take an example of Customer Dimension, assume we have following columns
- CUSTOMER_KEY
- CUSTOMER_NAME
- CUSTOMER_GENDER
- CUSTOMER_MARITAL_STATUS
- CUSTOMER_TIER
- CUSTOMER_STATUS
- Now, Name, Gender, MaritalStatus gets a change very rare, once in a while but Tier & Status get the change very frequently, assume that we are not interested in historical changes of Name, Gender & Marital Status, holding only latest value should be fine, but we need to track the history changes of Tier & Status
- Now create a junk dimension by removing Tier & Status from Original Dimension
- So our Original Customer Dimension will be like this
- CUSTOMER_KEY
- CUSTOMER_NAME
- CUSTOMER_GENDER
- CUSTOMER_MARITAL_STATUS
- Create a new junk dimension for Tier, Status and create another mapping dimension, so overall structure will look like this
- By doing so, new type-2 record generation will happen only in a mapping table and not in the main table, so that way we can have easy mainteanance.
- Only disadvantage is - it will introduce extra joins to get the full picture of history changes of a particular customer
- But maintenance and storage vise, we will get benefits
No comments:
Post a Comment