In the top down approach suggested by
Bill Inmon,
we build a centralized repository to house
corporate wide business data. This repository is
called Enterprise Data Warehouse (EDW). The
data in the EDW is stored in a normalized
form in order to avoid redundancy.
The central repository for corporate wide data helps
us maintain one version of truth of the data.
The data in the EDW is stored at the
most detail
level. The reason to build the EDW on the most
detail level is to leverage
-
Flexibility to be used by
multiple departments.
-
Flexibility to cater for
future requirements.
The disadvantages of storing
data at the detail level are
-
The complexity of design
increases with increasing level of detail.
-
It takes large amount of
space to store data at detail level, hence
increased cost.
Once the EDW is implemented we
start building subject area specific data marts
which contain data in a denormalized form also called
star schema. The data in the
marts are usually summarized based on the end
users analytical requirements.
The reason to denormalize the
data in the mart is to provide faster access to the
data for the end users analytics. If we were to have
queried a normalized schema for the same analytics,
we would end up in a complex multiple level joins
that would be much slower as compared to the one on
the denormalized schema.
We should implement the
top-down approach when
-
The business has
complete clarity on all or multiple subject areas DWH
requirements.
-
The business is ready to
invest considerable time and money.
The advantage of using the
Top Down approach is that we build a centralized
repository to cater for one version of truth for
business data. This is very important for the data
to be reliable, consistent across subject areas and
for reconciliation incase of data related contention
between subject areas.
The disadvantages of using the
Top Down approach is that it requires more time and
initial investment. The business has to wait for the
EDW to be implemented followed by building the data
marts before which they can access their reports.
The bottom up approach suggested by
Ralph Kimball
is an incremental approach to build a data
warehouse. Here we build the data marts separately
at different points of time as and when the specific
subject area requirements are clear. The data
marts are integrated or combined together to form a
data warehouse. Separate data marts are combined
through the use of conformed dimensions and
conformed facts. A conformed dimension and a
conformed fact is one that can be shared across data
marts.
A Conformed dimension has
consistent dimension
keys, consistent attribute names and consistent values
across separate data marts.
The conformed dimension means exact same thing with
every fact table it is joined.
A Conformed fact has the same definition of
measures, same dimensions joined to it
and at the same granularity across data
marts.

In order to build conformed dimensions and facts we
need to create a Bus Matrix with the rows
corresponding to the various data marts and columns
corresponding to all the dimension tables as
depicted in the diagram above.
The bottom up approach helps us incrementally build
the warehouse by developing and integrating data
marts as and when the requirements are clear. We
dont have to wait for knowing the over all
requirements of the warehouse. Care must be taken to
build an exhaustive bus matrix from the first data
mart itself identifying all possible dimensions,
otherwise we would be building a stove-pipes in the
organization.
We should implement the
bottom up approach when
-
We have initial cost
and time constraints.
-
The complete warehouse
requirements are not clear. We have clarity to
only one data mart.
The advantage of using the
Bottom Up approach is that they do not require
high initial costs and have a faster implementation
time, hence the business can start using the marts
much earlier as compared to the top-down approach.
The disadvantages of using the
Bottom Up approach is that it stores data in the
denormalized format, hence there would be high space
usage for detailed data. We have a tendency of not
keeping detailed data in this approach hence loosing
out on advantage of having detail data .i.e.
flexibility to easily cater to future requirements.