HomeData LakeWhat is a data lake?

Comments

What is a data lake? — 12 Comments

  1. Pingback:Azure Data Lake: Why you might want one |

  2. Pingback:Azure Data Lake - SQL Server - SQL Server - Toad World

  3. Pingback:How an MPP appliance solution can improve your future - SQL Server - SQL Server - Toad World

  4. Pingback:Azure Data Lake | James Serra's Blog

  5. Pingback:Why use a data lake? - SQL Server - SQL Server - Toad World

  6. Pingback:Why use a data lake? | James Serra's Blog

  7. You mention connecting to Data Lake Store from Polybase.
    Is this possible? I cannot find any examples of how to do it.
    Since Data Lake Store has different authentication than Blob storage, I haven’t been able to figure out how to connect.

  8. Hi

    Is the an example you’ve done where you are using polybase to read data from data lake store?

    — C: Create an external data source
    — LOCATION: Provide Azure storage account name and blob container name.
    — CREDENTIAL: Provide the credential created in the previous step.

    CREATE EXTERNAL DATA SOURCE AzureStorage
    WITH (
    TYPE = HADOOP,
    LOCATION = ‘wasbs://@.blob.core.windows.net’,
    CREDENTIAL = AzureStorageCredential
    );

    I’m struggling with this error,
    “Msg 105001, Level 16, State 1, Line 42
    External access operation failed because the location URI is invalid. Revise the URI and try again.”

  9. Pingback:Get a Data Lake – ELT not ETL | Tales from a Trading Desk

  10. James,

    Thanks for great blog. It is very informative. When we say data lake is kind of repository for all data, I am wondering should be land OLTP, ERP, CRM data as well? That’s where I am struggling to understand because management of DL especially in case of data continuously change at source will be cumbersome to maintain and single platform might not be able to address all ETL as aspect to manage the data in DL. Plus OLTP ( Transaction source) might have tons of 3rd normal tables which might be challenging to sync if we go with all data definition of DL in native format.

    • Hi Ketan,

      Great question! No, not all data should go into the data lake. Besides use cases such as backup and archiving, data should only be in the data lake if it provides value to the end users mining the data in the data lake. They may never need to join semi-structured data with OLTP data, so skip the data lake and go from the OLTP source right to the relational data warehouse. This is especially true if you have created SSIS packages that do a lot of transformations of the data. Hope this helps!

  11. Pingback:ביג דאטה: מחסני נתונים, אגמי נתונים, ומרכולי נתונים – סדר בבלאגן – בלוג ארכיטקטורת תוכנה

Leave a Reply

Your email address will not be published. Required fields are marked *

HTML tags allowed in your comment: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>