When to use T-SQL or SSIS for ETL

When doing ETL, you have the choice of using T-SQL or SSIS.  What things should you consider when deciding which one to use?  Here are some of the major design considerations to think about.  These points were derived from Tim Mitchell’s (sitepresentation:

  • Performance – With T-SQL, everything is processed within the SQL engine.  With SSIS, you are bringing all the data over to the SSIS memory space and doing the manipulation there.  So if speed is an issue, usually T-SQL is the way to go, especially when dealing with a lot of records.  Something like a JOIN statement in T-SQL will go much faster than using lookup tasks in SSIS.  Another example is a MERGE statement in T-SQL has much better performance than a SCD task in SSIS for large tasks
  • Features/capabilities – Some features can only be done in either T-SQL or SSIS.  You can shred text in SSIS, but can’t in T-SQL.  For example, text files with an inconsistent number of fields per row can only be done in SSIS.  So certain tasks may force you into using one or the other
  • Current skill set – Are the people in your IT department more familiar with SSIS or T-SQL?
  • Ease of development/maintenance – Of course, whatever one you are most familiar with will be the easiest, but if your skills at both are fairly even, then SSIS is usually easier to use because it is graphical, but sometimes you can develop quicker in T-SQL.  For example, having to join a bunch of tables will require a bunch of tasks in SSIS, where in T-SQL it is one statement.  So it might be easier to create the tasks to join the tables in SSIS, but it will take longer to build then writing a T-SQL statement
  • Complexity – SSIS can be more complex because you might need to create many tasks to accomplish your objective, where in T-SQL it might just be one statement, like in the example above for joining tables
  • Extensibility – SSIS has better extensibility because you can create a script task that uses C# that can do just about anything, especially for non-database related tasks.  T-SQL is limited because it is only for database tasks.  SSIS also has logging, which T-SQL does not
  • Likelihood of depracation/breaking changes – Minor issue, but T-SQL is always removing features in each release that will have to be rewritten
  • Types/architecture of sources and destinations – SSIS is better if you have multiple types of sources.  For example, it works really well with Oracle, XML, flat-files, etc.  SSIS was designed from the beginning to work well with other sources, where T-SQL is designed for SQL Server and it requires more steps to access other sources, and there are additional limitations when doing so
  • Local regulations – Are there some company standards you have to adhere to that would limit which tool you can use?

If you decide T-SQL is the way to go and you just want to execute a bunch of T-SQL statements, it’s still a good idea to wrap them in SSIS Execute SQL Tasks because you can use logging, auditing and error handling that SSIS provides that T-SQL does not.  You can also easily run SSIS Execute SQL Tasks in parallel so you are able to run stored procedures in parallel.

Note that most people use a hybrid approach, where you use SSIS, but for certain situations use an Execute SQL Task in SSIS to execute T-SQL instead of using a data flow.  And that Execute SQL Task can also call a stored procedure.  Some like this because it allows you to make a change in the stored procedure, avoiding having to make a change in SSIS and re-deploying the package.  The negative side of this is that instead of having everything contained within SSIS, you instead have to jump back and forth between SSIS and SSMS when you are building or debugging a project.  It’s a balance you will have to weigh as you go along.

More Info:

Video ETL Head-To-Head: T-SQL vs. SSIS by Tim Mitchell

Row Insert from SSIS package Vs Transact-SQL Statements

ETL vs SQL

SSIS vs T-SQL – which one is fastest for ETL tasks?

On The Board #8 : ETL in T-SQL vs. SSIS

BI Best Practices: SQL vs. SSIS Tools (in SSIS ETL packages)

SQL Server Data Migration Approaches: SSIS vs. SQL Server Stored Procedure

About James Serra

James currently works for Microsoft specializing in big data and data warehousing using the Analytics Platform System (APS), a Massively Parallel Processing (MPP) architecture. Previously he was an independent consultant working as a Data Warehouse/Business Intelligence/MDM architect and developer, specializing in the Microsoft BI stack. He is a SQL Server MVP with over 25 years of IT experience.
This entry was posted in ETL, SQLServerPedia Syndication, SSIS. Bookmark the permalink.

7 Responses to When to use T-SQL or SSIS for ETL

  1. Jamie Thomson says:

    Hi James,
    Good article. I wrote a lengthy reply but accidentally hit CTRL-W and lost the whole damn lot :)
    I can’t be bothered writing it all out again but I will bring up what I think is one vital point; that is, you haven’t explicitly mentioned the one true differentiator of SSIS – its ability to combine data from different data sources and operate upon it *in a single operation*.

    Everything else is by-and-large subjective.

    Extensibility? T-SQL has got SQLCLR.
    Complexity? That simply depends on one’s opinion.
    Ease of dev – Again, a matter of opinion.

    The one absolute differentiator of SSIS dataflows is their heterogeneity.

    JT

    P.S. Can you give an example of the following:
    “text files with an inconsistent number of fields per row can only be done in SSIS”
    I don’t see what about this scenario makes it easier to deal with this in SSIS (except for the fact that it has the Flat File source – but I don’t think that’s what you’re getting at)

  2. Ken Luong says:

    Thank you for writing this article down. Out of frustration to prove to the hiring manger that I am right on T-SQL is faster and outperformance SSIS in many different ways although SSIS has man advantages over T-SQL and your article covers that.
    (but do I need to prove to stupid people that they don’t know the stuff or should I move on)
    Two points I want to add to your article are these.
    1. SSIS can only handle row by row process (procedural) where as T-SQL can handle both procedural and set-based.
    2. SSIS can’t handle recursion such parent/child relationship.

    • sam says:

      I know your post is an year ago but thought of replying,the right way to do it is using tsql inside a storeprocedure and place it in SSIS package,1)the ssis package can handle setbased all you need to do is first load into a temporary table and bulk update to actual table which is same when it comes to tsql 2)handling recursion such parent/child relationship need to be done with tsql in the source task or lookup task of ssis depending on what you need.

  3. Pingback: SSIS Lookup or T-SQL Join « Derek Tech Cornor

  4. Pingback: Presentation Slides for Building an Effective Data Warehouse Architecture | James Serra's Blog

  5. ragini says:

    Its really grt post and made me to learn , ‘ how to use non-equi join using lookup and merge join transformation. Thanks a lot.

  6. Pingback: SQL Server Agent job steps vs SSIS - SQL Server - SQL Server - Toad World

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>