Wednesday, December 22, 2010

ETL: Creating Data For BI - Extraction

Putting our transactional data in a format that will be suitable for use in business intelligence (BI) involves three sets of operations, in sequential order. First, we extract the data from the transactional system or systems. Next, we transform the extracted data to prepare it for consumption. Finally, we load the transformed data into the data warehouse or BI database. BI professionals refer to these operations in total as "Extraction, Transformation, and Loading," or simply “ETL.” This article discusses Extraction.

The first step in processing is to extract data from the database where our transaction records are kept. Because we want to minimize any interference with daily business operations that use the transactional system, we typically perform our processing outside of normal business hours. A database administrator (DBA), data architect (DA), or someone else responsible for business intelligence would begin by creating a query that pulls the needed data from the transactional database. With today’s computer systems it’s relatively easy for a system administrator to then create an automated process that runs the query late at night.

Larger companies may use software tools developed especially for ETL that incorporates the extraction query into the larger process. Some of the better-known competitors in this market include Informatica, Ab Initio, and International Business Machine (IBM) Inc.’s DataStage. Depending on the level of sophistication of the ETL process the data that are extracted may be saved into a text file or may be pulled directly into a separate data store for transformation. Sometimes, by clever use of Structured Query Language (SQL), data records can be partially transformed as they are extracted.

No comments:

Post a Comment