Thursday, December 23, 2010

ETL: Creating Data For BI - Transformation and Loading

Putting our transactional data in a format that will be suitable for use in business intelligence involves three sets of operations, in sequential order. First, we extract the data from the transactional system or systems. Next, we transform the extracted data to prepare it for consumption. Finally, we load the transformed data into the data warehouse or BI database. BI professionals refer to these operations in total as "Extraction, Transformation, and Loading," or simply “ETL.” This article discusses Transformation and Loading.

Transformation

Once the data are extracted from the transaction system, they typically need to be transformed to prepare them for business intelligence use. Some of the more common transformation operations include the following (to mention just a few):

• De-duplication: Find and remove duplications of customers, products, etc. This is critical if you’re pulling data from two or more transaction systems, as in the case discussed in an earlier article where you have a brick-and-mortar store and an e-commerce store that aren’t integrated together. Speaking of which:

• Merging: When we have data from multiple systems we also need to combine them so that we have a single view across the whole business

• Data conversions: These might include things like currency conversions, converting English to Metric measures or the other way around, and so on.

• Data cleansing: No matter how careful you are in entering data into the transaction system, there will inevitably be errors that no one catches. A well-designed ETL system can be your best defense against such errors slipping through and distorting your view of the business.

• De-normalization: Preparing the data for going from snowflake to star schema. Sometimes this will be done as part of the extraction step.

Loading

Having been extracted and transformed, the data are now loaded into the data warehouse or BI database. Depending on the kind of database you’re using to do BI, there may be other operations that occur during or after this step, such as aggregations. By aggregation we mean that we summarize or “roll up” the data depending on the dimensions in the database so that we can later “drill down” from higher to lower levels of detail.

No comments:

Post a Comment