Friday, December 17, 2010

Capturing Data For Business Intelligence (Part Two)

In "Capturing Data (Part One)" we discussed a bit about how transaction data are stored in database tables and began to develop the idea that such data are designed (or "architected") to get data in as easily and accurately as possible. This article expands on these concepts. If you haven't already, you should read "Capturing Data (Part One)" before continuing.

When we left our point-of-sale (POS) transaction system in the previous article, we had taken the customer and product names and replaced them with unique identifying codes, so that

Jones, B., 1 lb. Coffee @ $ .01

became

325, 1, 22, .01

But who can make sense of that just by looking at it?! (OK, maybe a geek like me could, but please!) In order to preserve descriptive customer and product information, the database designer for the POS system ends up moving them to their own tables and linking these tables to the sales transaction table. Now instead of one table there are three.


The technical term for this process is “database normalization.” The advantage of doing this in the data entry process is that we don’t have to type in “B. Jones” or “Coffee” every time we sell coffee to him/her. We type the descriptive information about them into the Customer or Product tables the right way one time and then have the computer look them up when we need them for a sale. If we take this to the next step, suppose we want to know the supplier for a particular product. Using normalization, we would create a “Supplier” table and separate it from the “Product” table. And on it goes. Putting only the information we need in each table and repeating it as little as possible in other tables leads to a table structure that looks like this:


This kind of a table structure is often called a “snowflake.” The good news about this snowflake "schema" (as the structure is called in more technical terms) is that it’s easy and efficient to get transaction data into the system. This is because being able to look up information about customers and products cuts down on data entry errors and makes the process go more quickly and smoothly. The bad news comes when you try to get data out of the system, because in order to make the information you get out descriptive and/or meaningful you have to pull data from a lot of tables. Most POS system designers understand this and try to help you out by providing basic reporting. But if you’re trying to get data out of all of those tables at the same time sales clerks are trying to enter orders for customers, the system can really bog down. This why we create a separate data warehouse space for business intelligence purposes, and that is where processing comes in.

No comments:

Post a Comment