April 15, 2011
Data staging area, which can be imagined as kitchen of a restaurant consist of a storage area and the set of processes called extract-transformation-load (ETL) (Kimball, Page 8) To create a data warehouse for our changing business needs, we need to capture data from the source system by using different ETL tools. The ETL tool used would load the data into the database for our data warehouse (Taylor, Art).
ETL tools should extract, transform and load from source system to the warehouse. Extraction collects the required data from the data warehouse considering the proper format and the data layout. Transformation involves application of "a series of rules or functions" for the data being transformed (ETL - Extract Transform Load). We need to perform operations as duplicate removal, standardization, filtering, translating, sorting and check for consistency during this process. Loading, simply loads the data after extracting and transforming the data from the source.
These days the usefulness of these ETL tools are being researched to have more functionality than simply helping to extract, transform and load the data to the data warehouse. Some of those features may include - real-time or near real-time data integration, data migrations, Compliance, Master Data Management(MDM) and Operational BI (It.toolbox.com).
In market, we can see many ETL tools. Before we decide to get them we need to define our business requirement and then consider the technical aspect for our data warehouse. Some of the commercial tools are IBM - Infosphere DataStage Informatica Power Center, Oracle Warehouse Builder (OWB), Oracle Data Integrator (ODI), SAS ETL Studio etc (ETL Tools Comparison). Also, some of the freeware and open-source tools are Pentaho Data Integration (Kettle), Talend Integrator Suite, CloverETL (ETL Tools Comparison). If our business need is to invest less on data warehouse and we have qualified technical people to assist with the data warehousing then we can use the open source tools which is free and also can be adapted to the individual business needs by actually changing the tool to adapt to our business. Even though the tool comes free, sometimes the cost of upgrading it to fit to our business may cost more than the commercial tools available.
Information Server Infosphere platform from IBM is one of the ETL tool that has strong vision on the market and is very flexible that also progresses towards common metadata platform (ETL Tools Comparison). They have taken good initiatives to make it even more stronger and businesses that are using it have high level of satisfaction (ETL Tools Comparison). At the same time, it is hard to learn and needs time to get used to it. They also have long implementation cycles which uses a lot of data storage in GBs, and also the processing power (ETL Tools Comparison).
SQL Server Integration Services from Microsoft enables clients to have a broad set of documentation and support which has very easy and fast implementation with standardized data integration unlike from IBM (ETL Tools Comparison). Microsoft have real-time and message based capabilities with relatively low cost for clients to invest on warehousing (ETL Tools Comparison). Apart from those positives, they have problems in non-windows environment which Microsoft always tend to have in their products. Also, Microsoft lacks in good vision and strategy for the future of their ETL tool (ETL Tools Comparison).
Comparison and contrast of two different ETL tools
Data staging area, which can be imagined as kitchen of a restaurant consist of a storage area and the set of processes called extract-transformation-load (ETL) (Kimball, Page 8) To create a data warehouse for our changing business needs, we need to capture data from the source system by using different ETL tools. The ETL tool used would load the data into the database for our data warehouse (Taylor, Art).
ETL tools should extract, transform and load from source system to the warehouse. Extraction collects the required data from the data warehouse considering the proper format and the data layout. Transformation involves application of "a series of rules or functions" for the data being transformed (ETL - Extract Transform Load). We need to perform operations as duplicate removal, standardization, filtering, translating, sorting and check for consistency during this process. Loading, simply loads the data after extracting and transforming the data from the source.
These days the usefulness of these ETL tools are being researched to have more functionality than simply helping to extract, transform and load the data to the data warehouse. Some of those features may include - real-time or near real-time data integration, data migrations, Compliance, Master Data Management(MDM) and Operational BI (It.toolbox.com).
In market, we can see many ETL tools. Before we decide to get them we need to define our business requirement and then consider the technical aspect for our data warehouse. Some of the commercial tools are IBM - Infosphere DataStage Informatica Power Center, Oracle Warehouse Builder (OWB), Oracle Data Integrator (ODI), SAS ETL Studio etc (ETL Tools Comparison). Also, some of the freeware and open-source tools are Pentaho Data Integration (Kettle), Talend Integrator Suite, CloverETL (ETL Tools Comparison). If our business need is to invest less on data warehouse and we have qualified technical people to assist with the data warehousing then we can use the open source tools which is free and also can be adapted to the individual business needs by actually changing the tool to adapt to our business. Even though the tool comes free, sometimes the cost of upgrading it to fit to our business may cost more than the commercial tools available.
Information Server Infosphere platform from IBM is one of the ETL tool that has strong vision on the market and is very flexible that also progresses towards common metadata platform (ETL Tools Comparison). They have taken good initiatives to make it even more stronger and businesses that are using it have high level of satisfaction (ETL Tools Comparison). At the same time, it is hard to learn and needs time to get used to it. They also have long implementation cycles which uses a lot of data storage in GBs, and also the processing power (ETL Tools Comparison).
SQL Server Integration Services from Microsoft enables clients to have a broad set of documentation and support which has very easy and fast implementation with standardized data integration unlike from IBM (ETL Tools Comparison). Microsoft have real-time and message based capabilities with relatively low cost for clients to invest on warehousing (ETL Tools Comparison). Apart from those positives, they have problems in non-windows environment which Microsoft always tend to have in their products. Also, Microsoft lacks in good vision and strategy for the future of their ETL tool (ETL Tools Comparison).
Basically, we need to choose the ETL tools that serves our business requirement the most. We need to consider the future, business plan, cost and many more things before we stick with one tool to extract, transform and load from source system to the warehouse.
ETL - Extract Transform Load. Web. 17 Apr. 2011. <http://www.etltools.org/>.
"ETL Tools Comparison." ETL Software Tools. Web. 17 Apr. 2011. <http://www.etltools.net/etl- tools-comparison.html>.
Works Cited:
ETL - Extract Transform Load. Web. 17 Apr. 2011. <http://www.etltools.org/>.
"ETL Tools Comparison." ETL Software Tools. Web. 17 Apr. 2011. <http://www.etltools.net/etl- tools-comparison.html>.
It.toolbox.com. 25 Oct. 2010. Web. 17 Apr. 2011. <http://it.toolbox.com/wiki/index.php/Extract,_Transform,_Load>.
Kimball, Ralph, and Margy Ross. The Data Warehouse Toolkit: the Complete Guide to Dimensional Modeling. New York: Wiley, 2002. Print.
Taylor, Art. "Data Warehouse and the ETL Tool." The Data Administration Newsletter – TDAN.com. 1 June 1999. Web. 17 Apr. 2011. <http://www.tdan.com/view-articles/5259/>.
Kimball, Ralph, and Margy Ross. The Data Warehouse Toolkit: the Complete Guide to Dimensional Modeling. New York: Wiley, 2002. Print.
Taylor, Art. "Data Warehouse and the ETL Tool." The Data Administration Newsletter – TDAN.com. 1 June 1999. Web. 17 Apr. 2011. <http://www.tdan.com/view-articles/5259/>.
No comments:
Post a Comment