HomeAzure Data LakeData Lakehouse defined

Comments

Data Lakehouse defined — 10 Comments

  1. Pingback:Living in the Lakehouse – Curated SQL

    • Paul brings up some good points on areas where Synapse needs to improve, but I strongly disagree that Synapse is not ready for production. Just because it has a few areas where it is weak, does not mean it should not be used. There are 10s of thousands of companies that are using Synapse in production, where the weak areas he lists are not used, not important, or at least not a show stopper, and the benefits far outweigh the few limitations.

      As for not using a SQLDW, I see only a few use cases from customers where I don’t think a SQLDW is needed. But at least 90% of customers do use a SQLDW for the reasons listed in this blog and in Data Lakehouse & Synapse. Many times the extra cost and ETL of copying data to a SQLDW is well worth it.

      • Curious there seems so little discussion about how primitive Synapse is in terms of pipelines or supporting what we already do today and often are quite happy where we are at. Would we like to take advantage of new tech – of course. That’s why we’re in this business. But….

        Lets talk about the very common scenario so many of us live in shall we?

        Our team supports over 10K SSIS packages. We cross databases with joins ALL the time. We load a lot of different sources using different methods and then feed data to many other downstream applications. The majority of sources are direct connections such as scraping deltas from Dynamics. The flat files are delimited such as csv or xml not parquet. We generate a lot of packages using BIML. We have load and query performance challenges within Sql Server with fact tables up to 1 trillion rows but it’s manageable. The thought of federating those queries is just such sheer lunacy – not just from a performance standpoint but also an integration viewpoint. Makes more sense to take cattle de-worming medication based on internet rumors…did I say that outloud?

        So what in Synapse can someone who already has a sizable EDW utilize? In other words…as the proverbial US 1980s Wendy’s commercial asked….”where’s the beef?” We need the horsepower of the MPP in the dedicated Sql Pool (No one at Microsoft can think of a better name really?) but the idea we could migrate thousands of SSIS packages into…into what…synapse pipelines? Then I ask what you been smokin 😉 ADF – IR is likely our only realistic option for getting closer to a PaaS solution of SSIS but I question the cost verses benefit even for that especially since Microsoft is swinging away from ADF now.

        Bottom line is this…many of us have really significant investments in existing technology and it may be working quite well. We want to prepare for the future and get fully PaaS no argument there. But we need to a reasonable roadmap that will both recognize business value along with take on a reasonable LOE.

        I’d love to hear ideas on this. I’m leaning on just doing prep work such as getting to Sql 2019 and waiting for Synapse Gen 3/support for multiple databases.

  2. Pingback:Data Mesh defined | James Serra's Blog

  3. Pingback:Data Lake VS Delta Lake – Data Upsert and Partition Compaction Management - Plainly Blog - Data Modelling, Advanced Analytics

  4. Pingback:Do we still need a data warehouse? | Data Warehousing and Machine Learning

  5. Pingback:Mapping Dataflows: nieuwe tool, nieuwe standaard?? - Powerdobs

  6. Pingback:Data Lakehouse vs Data Warehouse | Data Platform and Machine Learning

  7. Pingback:Data Lakehouse vs Data Lake+Warehouse | Data Platform and Machine Learning

Leave a Reply

Your email address will not be published. Required fields are marked *

HTML tags allowed in your comment: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>