Tuesday, December 21, 2010

Processing Data For Business Intelligence (Part Two)

This article expands on concepts discussed in "Processing Data (Part One)." If you have not read that article, I suggest you do so before proceeding.

In the point-of-sale (POS) transaction system we’ve talked about previously, the database designer normalized customer data into two tables (one for Customer, one for Location) in order to avoid repeating city, state, and zip for each customer. When we process the data for consumption, we'll want to “denormalize” the data into a single table that puts all the location data together with the customer data. Why? By cutting down the number of tables we have to get data from, we make it quicker and easier to get data out of the database for reporting, analysis, and other kinds of business intelligence. When this process is complete, the database schema will most likely look like a star.


Notice that there are labels on each table. The central table is labeled “Fact.” In data warehouse parlance, the “fact” table contains information about business events that we want to measure. In our point-of-sale example our facts would probably be made up of the individual items our customer purchased on his or her visit to the store or web site. Each record in the fact table might look similar to the record we displayed when we discussed “Capturing Data” previously.

The tables around and connected to the fact table are labeled “Dim.” These represent what we’ll call “dimension” tables. When we talk about a dimension, we really just mean some aspect of the data that we’re interested in. In our point-of-sale example we’ve talked about Customers, Products, and Locations. All of these could be dimension tables in our data warehouse. Other dimensions might include Time (so we know when things are selling), Campaign (if we’re interested in knowing how the customer found us), and the list goes on and on. If this sounds a little confusing, think about how you want to look at your business. You might say to yourself, “I’d really like to see sales by customer by product by salesperson by month.” Each of these “bys” we listed off (Customer, Product, Salesperson, Month – or more generally Time) would be viable dimension candidates for our data warehouse. Sales would be our primary fact.

No comments:

Post a Comment