DWH InformationWelcome to Data WareHousing Information site.
Powered by "DWH Professionals", "DWH Enthusiasts" and People alike.

Main Menu

DWH Approach

 Back

 

DWH Main Page

Fundamentals

Technical

Project Management

Best Practices

DWH FAQ's

CBIO

About the Author

Feedback

Suggested Links

- - - - - - - - - - - -

There are two strategies to build a data warehouse namely

  • Top - Down Approach (Suggested by Bill Inmon)
  • Bottom - Up Approach (Suggested by Ralph Kimball)

Top Down Approach (Suggested by Bill Inmon)

 

 

In the top down approach suggested by Bill Inmon, we build a centralized repository to house corporate wide business data. This repository is called Enterprise Data Warehouse (EDW). The data in the EDW is stored in a normalized form in order to avoid redundancy.

The central repository for corporate wide data helps us maintain one version of truth of the data.

The data in the EDW is stored at the most detail level. The reason to build the EDW on the most detail level is to leverage

  1. Flexibility to be used by multiple departments.

  2. Flexibility to cater for future requirements.

The disadvantages of storing data at the detail level are

  1. The complexity of design increases with increasing level of detail.

  2. It takes large amount of space to store data at detail level, hence increased cost.

Once the EDW is implemented we start building subject area specific data marts which contain data in a denormalized form also called star schema. The data in the marts are usually summarized based on the end users analytical requirements.

The reason to denormalize the data in the mart is to provide faster access to the data for the end users analytics. If we were to have queried a normalized schema for the same analytics, we would end up in a complex multiple level joins that would be much slower as compared to the one on the denormalized schema.

We should implement the top-down approach when

  1.  The business has complete clarity on all or multiple subject areas DWH requirements.

  2. The business is ready to invest considerable time and money.

The advantage of using the Top Down approach is that we build a centralized repository to cater for one version of truth for business data. This is very important for the data to be reliable, consistent across subject areas and for reconciliation incase of data related contention between subject areas.

The disadvantages of using the Top Down approach is that it requires more time and initial investment. The business has to wait for the EDW to be implemented followed by building the data marts before which they can access their reports.

Bottom Up Approach (Suggested by Ralph Kimball)

 

 

The bottom up approach suggested by Ralph Kimball is an incremental approach to build a data warehouse. Here we build the data marts separately at different points of time as and when the specific subject area requirements are clear.  The data marts are integrated or combined together to form a data warehouse. Separate data marts are combined through the use of conformed dimensions and conformed facts. A conformed dimension and a conformed fact is one that can be shared across data marts.

A Conformed dimension has consistent dimension keys, consistent attribute names and consistent values across separate data marts. The conformed dimension means exact same thing with every fact table it is joined.

A Conformed fact has the same definition of measures, same dimensions joined to it and at the same granularity across data marts.

 

In order to build conformed dimensions and facts we need to create a Bus Matrix with the rows corresponding to the various data marts and columns corresponding to all the dimension tables as depicted in the diagram above.

The bottom up approach helps us incrementally build the warehouse by developing and integrating data marts as and when the requirements are clear. We dont have to wait for knowing the over all requirements of the warehouse. Care must be taken to build an exhaustive bus matrix from the first data mart itself identifying all possible dimensions, otherwise we would be building a stove-pipes in the organization.

We should implement the bottom up approach when

  1.  We have initial cost and time constraints.

  2. The complete warehouse requirements are not clear. We have clarity to only one data mart.

The advantage of using the Bottom Up approach is that they do not require high initial costs and have a faster implementation time, hence the business can start using the marts much earlier as compared to the top-down approach.

The disadvantages of using the Bottom Up approach is that it stores data in the denormalized format, hence there would be high space usage for detailed data. We have a tendency of not keeping detailed data in this approach hence loosing out on advantage of having detail data .i.e. flexibility to easily cater to future requirements.

- - - - - - - - - - - --

 

Back

Contact me - About Krishan Vinayak- Disclaimer- References