Data cleansing in SSIS, DQS, and MDS

Data cleansing can be done via SSIS as well as Data Quality Services (DQS) and Master Data Services (MDS).  The lines are a bit blurred when talking about data cleansing using SSIS, DQS and MDS.  In what product should data be cleaned?  To give examples: having to convert a Unicode string to a non-Unicode string can be done in SSIS using the data conversion transformation; converting the word “one” to the number “1” would use the derived column transformation (which has a sophisticated expression language) in SSIS.  Cleaning state codes by comparing them to a knowledge base/reference dataset containing valid state codes can be done with the lookup transformation in SSIS; removing duplicates from a table (i.e. a customer that is entered twice with a different spelling) can be done in SSIS using the fuzzy lookup transformation.  These SSIS transformations would need to be used with other SSIS data flow components to fully complete the data cleaning solution.

But all those tasks can be done much easier using DQS, which also has a lot more features available.  DQS enables you to build a knowledge base and use it to perform a variety of critical data quality tasks, including correction, enrichment, standardization, and de-duplication of your data (see Data Quality Services Books Online).  Or you can perform matching manually within the Master Data Services Add-in for Excel which leverages the matching functionality in Data Quality Services.  And there is a DQS cleaning transformation that you can use in SSIS (see Overview of the DQS Cleansing Transform).

MDS has limited data cleansing via business rules which can apply default values and change values.  The best approach is to use DQS to clean the data from the source and then copy that data into MDS.

More info:

Data Conversion in SSIS

Cleanse and Match Master Data by Using EIM – Microsoft

Comparing Data Quality Services (DQS) and Master Data Services (MDS)

About James Serra

James currently works for Microsoft specializing in big data and data warehousing using the Analytics Platform System (APS), a Massively Parallel Processing (MPP) architecture. Previously he was an independent consultant working as a Data Warehouse/Business Intelligence/MDM architect and developer, specializing in the Microsoft BI stack. He is a SQL Server MVP with over 25 years of IT experience.
This entry was posted in DQS, MDS/MDM, SQLServerPedia Syndication, SSIS. Bookmark the permalink.

One Response to Data cleansing in SSIS, DQS, and MDS

  1. I like this post, as it gives some good examples as to what can be accomplished with DQS and MDS.

    And on a side note, I can only smile over the (which has a sophisticated expression language) – because that is the understatement of the year ;o)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>