SSIS Fuzzy Lookup Error

A Fuzzy Lookup Task in SSIS that I was working on was bombing out, and I had a heck of a time figuring out why. The error did not tell me much:

Error: 2011-05-18 10:42:09.33     Code: 0xC0047022     Source: Fuzzy Match InfoUSA SSIS.Pipeline     Description: SSIS Error Code DTS_E_PROCESSINPUTFAILED.  The ProcessInput method on component “Fuzzy Lookup” (49) failed with error code 0x8000FFFF while processing input “Fuzzy Lookup Input” (50). The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running.  There may be error messages posted before this with more information about the failure.
End Error

Error: 2011-05-18 10:42:11.50     Code: 0xC02020C4     Source: Fuzzy Match InfoUSA CUSTOMER [1]     Description: The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020.
End Error

Error: 2011-05-18 10:42:11.58     Code: 0xC0047038     Source: Fuzzy Match InfoUSA SSIS.Pipeline     Description: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED.  The PrimeOutput method on component “CUSTOMER” (1) returned error code 0xC02020C4.  The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing.  There may be error messages posted before this with more information about the failure.
End Error

After a lot of digging, what I found out is that it was a server memory issue: The Fuzzy Lookup Task loads the entire reference table defined in the task into memory.  A large reference table could eat up all of the memory on the server.  That is what was happening to me.  The server had 8MB of memory, and I was loading in 2 million records.  Once all the free memory was gone, the task bombed.  When the task runs, you can jump on the server and watch how the memory slowly gets eaten up.  The only solution if you want to continue using the Fuzzy Lookup Task is to get more memory, free up memory, or reduce the number of rows in the reference table.

Also note that if the job does not bomb but there is very little free memory, the Fuzzy Lookup Task will perform very slowly.  When I reduced the number of rows in the reference table to 1.8 million, the task did not bomb, but took over an hour as the free memory shrunk to about 3%.  When I reduced the number of rows to 1.5 million, it left 10% free memory and the task took only 10 minutes.

About James Serra

James currently works for Microsoft specializing in big data and data warehousing using the Analytics Platform System (APS), a Massively Parallel Processing (MPP) architecture. Previously he was an independent consultant working as a Data Warehouse/Business Intelligence/MDM architect and developer, specializing in the Microsoft BI stack. He is a SQL Server MVP with over 25 years of IT experience.
This entry was posted in Fuzzy Lookup, SQLServerPedia Syndication, SSIS. Bookmark the permalink.

One Response to SSIS Fuzzy Lookup Error

  1. Is there any way you can us “Blocking Indexes” to partition and run the match in parallel. Here an example. http://blog.melissadata.com/mt-search.cgi?blog_id=2&tag=Record%20Linkage&limit=20

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>