Thursday, August 8, 2013

Dimension

What is Dimension?

  • Something that quantifies a measure
  • For an example, consider this: If I just say… “20kg”, it does not mean anything. But if I say, "20kg of Rice (Product) is sold to Ramesh (customer) on 5th April (date)", then that gives a meaningful sense. These product, customer and dates are some dimension that qualified the measure - 20kg
  • Dimensions create a sense to identify a number

Dimension Column Types

  • Dimension table contains columns of different types
  • Keys
    • These are surrogate keys which are used to identify the rows uniquely
    • e.g. CustomerKey
  • Name Columns
    • Used for human names for easy identification of the entity
    • e.g CustomerFullName, CustomerNickName
  • Attributes
    • Used for pivoting in analysis
    • Will be used to analyse the business based on different perspectives
    • e.g. Gender, Marital Status, Age, etc..
  • Member Properties
    • Used for labels in the report
    • Generally we don't do pivoting or analysis on these columns but they are used to provide extra info on a report
    • e.g. Address, PhoneNumber, Email, etc..
  • Lineage Columns
    • Used for auditing, never exposed to end users
    • LastCreatedDate, LastModifiedBy, etc..

Types of Dimension

  • Conformed Dimension
  • Junk Dimension
  • Degenerated Dimension
  • Role Playing Dimension

Based on how frequently the data inside a dimension changes, we can further classify dimension as
  • Unchanging or static dimension (UCD)
  • Slowly changing dimension (SCD)
  • Rapidly changing Dimension (RCD)

Conformed Dimension

  • A dimension which is shared across multiple subject areas
  • e.g. "Customer" dimension used in both Marketting & Sales departments
  • Similarly Time & Date dimension

Degenerated Dimension

  • A Key in the Fact table, which does not have its own dimension
  • e.g. In InvoiceFact, we can have columns like TransactionAmount, InvoiceAmount, CustomerKey, CreatedDateKey, GeographyKey, InvoiceNumber, TransactionNumber
  • So we can see here InvoiceNumber & TransactionNumber don't have their own Dimensions like Customer, Date & Geography, but they are required to do other business activities

Junk Dimension

  • grouping of typically low-cardinality attributes (flags, indicators etc.) so that those can be removed from other tables and can be junked into an abstract dimension table
  • These junk dimension attributes might not be related. The only purpose of this table is to store all the combinations of the dimensional attributes which you could not fit into the different dimension tables otherwise
  • e.g. Gender & Marital Status, data looks like this
  • Gender
    Male
    Female
    Marital Status
    Single
    Married
    Divorced
  • So both these dimension have very low cardinality means they have very less number of distinct values
  • So rather than having Gender & MaritalStatus separately, we can apply cross join and generate one single structure like this
GM_id
Gender
MaritalStatus
1
Male
Single
2
Male
Married
3
Male
Divorced
4
Female
Single
5
Female
Married
6
Female
Divorced
  • Now we can refer GM_id in our Facts/Dimensions, this new dimension is called Junk Dimension
  • Improves manageability and improves SQL query performance

Role playing Dimension

  • A dimension which is being used at multiple places with different contextual meaning but having the same data
  • e.g. "Date"… for a particular business, we can have "Date of Sale", "Date of Delivery", "Date of Shipping", "Date of Hire", etc… basically all of them have the basic entity i.e. Date, but then context is different
  • So we create only 1 basic table (i.e. DateDimension) and then we refer it with different names, which are called Role playing Dimensions
  • They can be created separately in cube and in database, they can be created in terms of Views. e.g. On top of DateDimension, we create vwHireDateDimension, vwSalesDateDimension, etc…

Slowly Changing Dimension

Rapidly Changing Dimension

No comments:

Post a Comment