ETL is the most common method used when transferring data from a source system to a data warehouse. But there are cases where you might want to use ELT. What is the difference between the two? Read on:
Extract, Transform, and Load (ETL) is a process that involves extracting data from outside sources, transforming it to fit operational needs (sometimes using staging tables), then loading it into the end target database or data warehouse. This approach is reasonable as long as many different databases are involved in your data warehouse landscape. In this scenario you have to transport data from one place to another anyway, so it’s a legitimate way to do the transformation work in a separate specialized engine. Typically the entire process is done in SSIS using data flows.
Extract, Load, Transform (ELT) is a process where data is extracted for the source, then loaded into a staging table in the database, transforming it where it sits in the database and then loading it into the target database or data warehouse. When the data is extracted from the source into the staging tables, it is a raw copy, meaning you keep the column names the same as in the source database and you don’t convert data, calculate new data fields, etc. You may however filter unneeded rows and columns as you extract data so that you don’t waste resources on unneeded data. The process of moving data from the source to the staging tables is done in SSIS using data flows, but the process of moving the data from the staging tables to the data warehouse can be done with T-SQL instead for a performance boost along with the fact that it is usually easier to code than using SSIS data transformations.
ELT should be used, instead of ETL, in these various cases :
- There are big volumes of data
- The source database and the target database are the same
- The database engine is well adapted for that kind of processing, such as PDW, which is great at loading massive amounts of data very quickly
So in short, when using ETL the transformations are processed by the ETL tools, while in ELT the transformations are processed by the target datasource.
ELT has the benefit of minimizing the processing on the source since no transforming is being done, which can be extremely important if the source is a production system where you could be impacting the user experience as opposed to a copy of the source (via replication, database snapshot, etc). The negative of this approach is it may take longer to get the data into the data warehouse as with the staging tables you have an extra step in the process, and you will need more disk space for the staging tables.