HomeAzure Synapse Analytics/SQL DWAzure Synapse and Delta Lake

Comments

Azure Synapse and Delta Lake — 19 Comments

  1. James, great article and spot on explanation. Please don’t forget that Microsoft Azure Databricks has phenomenal Delta Lake support, together with the Photon engine enabling blazing fast queries on Delta Lake directly on ADLS. Azure Databricks is first party and integrates with the whole Azure ecosystem of ADF, AAD, PBI, ADLS, Synapse, etc. If anyone is interested in Delta Lake they should definitely check out Microsoft Azure Databricks.

  2. Pingback:Delta Lake Operability in Azure Synapse Analytics – Curated SQL

  3. Hi James,

    Great blog as always. Thank you for the posting.

    The claim that Delta Lake ‘supports ACID transactions’ requires qualifying.

    Unlike ACID transaction processing in a RDBMS such as SQL-Server, Delta Lake ACID support is constrained to a Delta Table. Executing DML transactions over more than one Delta Table in a Delta Lake will not guarantee ACID integrity for the respective transaction.

    Hence, Delta Lake is not necessarily a replacement for RDBMS Data Warehousing.

  4. Hi James,
    I follow your Blogs, Webinars, Youtube videos, linkedin, etc. It helped me a lot in Big Data Solution space. Thanks so much for your contribution to the Big Data space community.

    I’m planning to design a Big Data Solution in Azure (for a large Organization) which includes Modern Data Warehouse as well as Advanced Anylytics and I’d like to have your comment on my solution Architecture.

    The summery of my Solution Architecture is as follows:

    1. Bring all structured, unstructured, streaming data into ADLS as Staging (Bronze version)

    2. From ADLS’s Bronze version, use Azure Databricks Notebook (execute from Synapse Pipeline) to clean/transform data and load as Delta File/Table format in Delta Lake (Silver version) as “Single Source of Truth”.

    3. From Delta Lake (Silver version), use Databricks Notebook (execute from Synapse Pipeline) and load data a) into Synapse Dedicate SQL Pool using SQL Endpoint. And b) also prepare data (from Silver) for Advanced Analytics(AI and ML) and load into Delta Lake as Gold version

    4. In Synapse Dedicated SQL Pool, use CTAS and finally load into Star Schema

    5. a) For BI Report, connect to Synapse Dedicated SQL Pool using Power BI and b) for Advanced Analytics Report use Delta Lake Gold version

    Does the above Architecture look good to you? Please feel free to advise me.

    Your advise would be highly appreciated 🙂

    Thank you

      • James and Gerhard, as Power BI connector for delta lake is already available, I was wondering if we really need any synapse or SQL layer on top of the Delta lake Gold layer for BI consumption. is there any performance aspect we know using Databricks vs Synapse or any other SQL layer for Power BI consumption or any BI reporting consumption ?

        • Hi Shakthi,

          I am also having same confusion if delta table satisfy the BI needs why we need Azure Synapse.

          In most of the projects they are using like

          ADF –> ADB –> Delta Lake –> Synapse

          my question is why are we loading data again into synapse. What are use case which are not solved by delta engine.

  5. Hi James,

    great article! I always read your Blog.

    Just a note regarding Copy Activity and Delta Lake, only Sink is not supported, because you can still use as a source a Delta Lake by using a Synapse SQL Serverless View.

    Thank you so much

    –silvano

  6. Hi James,

    Great article, I especially appreciate your outline of limitations with regard to delta and related Synapse features.

    I’d love to get your thoughts on this. We know we can put a T-SQL layer over delta that will use the serverless sql pool. However, let’s assume production performance requirements and features such as native row level security/column masking are a necessity. This rules out the serverless sql pool.

    However, unless I’ve missed something, there seems a lack of Synapse options to bulk insert the lake delta into a dedicated pool table which offers improved performance with no spin up, and true RLS. An integration dataset cannot be created for a lakehouse delta table. The only way I’ve been able to achieve lake delta to sqlpool dedicated table is via a spark notebook, using the scala synapse dedicated pool connector:

    https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/synapse-spark-sql-pool-import-export

    Any thoughts?

  7. Great article, James
    Gives a good overview of the delta lake support in Synapse.

    Interesting that the only way to write to a delta lake is by spinning up a spark cluster or Databricks cluster. Are you aware of any libraries or other functionality that can write a delta file without spinning up a cluster?
    It seems a bit overkill to me to spin up a cluster for simply writing/updating a number of .parquet + .json files.

    Thanks
    – Lars

  8. Hi James,

    I’m hoping you can give me an answer to this question. I was told by the Product Manager last May that they were working on Delta Lake support for Lake Databases. This would be very helpful as we craft our strategy. Do you know if this was completed? IF so, can you point me to any documentation?

    Thanks,

    Gary

Leave a Reply

Your email address will not be published. Required fields are marked *

HTML tags allowed in your comment: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>