What is a Data Warehouse?

Picture a warehouse. Not an industrial warehouse with forklifts driving through endless rows of shelves filled with dusty pallets of unknown goods. Picture a modern warehouse, the kind Amazon uses to fulfill it's over 1 billion orders a year. The size is just as big as the old industrial warehouses, but a few things stand out: it's clean, automated and incredibly efficient. 

A Data Warehouse is analogous to this idea of a modern warehouse. The goods, in our case "data", are brought into the warehouse from various sources. Instead of boxes of toothbrushes, headphones and garden tools, a Data Warehouse contains data from user profiles, transactions, browser history, marketing campaigns, staffing history, call volume, etc. The various forms of data are organized within the warehouse such that any of it can be gathered quickly for analysis.

The end result is a large collection of a variety of data that organizations can analyze to support management decistion-making.

Important Points

  • Includes data from many different sources across an entire enterprise
  • Usually contains historical data
  • Manages data to provide flexible access
  • Includes online analytical processing (OLAP) for gathering data and delivering it to business users

 

Data Warehouse Characteristics

As noted by William Inmon, data warehouses are integrated, nonvolatile, time variant and subject oriented systems.

Integrated

Since data is gathered from a variety of sources, inconsistencies often need to be fixed. Once the data is stored consistently, it is much easier to extract meaningful reports about an organization's operations.

Nonvolatile

One of the reasons data warehouses are often so large is because once data is entered, it's never removed. Nonvolatile data doesn't change and new data is regularly added.

Time Variant

Since the data is nonvolatile, and new data can be added at any time, all data must have some sort time associated with it to prevent conflicts and allow for reporting on historical business trends.

Subject Oriented

Since the goal of a data warehouse is analytics (not processing transactions or other time sensitive tasks), the data can be structured in databases according to the subject matter.