Azure Data Lake enhancements

I first blogged about Microsoft’s new product, the Azure Data Lake, a few months back (here).  There are already enhancements, as announced at Stata + Hadoop World.  Here they are in brief:

  • The Azure Data Lake has been renamed to the Azure Data Lake Store.  The Data Lake Store provides a single repository where you can easily capture data of any size, type, and speed and without forcing changes to your application as the data scales.  In the store your data is accessible from any HDFS application and tool via WebHDFS.  It will be available in preview later this year
  • A new service called Azure Data Lake Analytics, which is a distributed analytics services build on Apache YARN that allows developers to be productive immediately on big data.  This is accomplished by submitting a job to the service where the service will automatically run it in parallel in the cloud and scale to process data of any size (scaling is achieved by simply moving a slider).  Then the job completes, it winds down resources automatically, and you only pay for the processing power used.  This makes it easy to get started quickly and be productive with the SQL, .NET, or Hive skills you already have, whether you’re a DBA, data engineer, data architect, or data scientist.  Because the analytics service works over both structured and unstructured data, you can quickly analyze all of your data – social sentiment, web clickstreams, server logs, devices, sensors, and more.  There’s no infrastructure setup, configuration, or management.  It will be available in preview later this year
  • A new language called U-SQL, which is a big data language that seamlessly unifies the ease of use of SQL with the expressive power of C#.  U-SQL’s scalable distributed query capability enables you to efficiently analyze data in the Azure Data Lake Store and across Azure Blob Storage, SQL Servers in Azure, Azure SQL Database and Azure SQL Data Warehouse.  U-SQL is built on the learnings from Microsoft’s internal experience with SCOPE and existing languages such as T-SQL, ANSI SQL, and Hive.  See Introducing U-SQL – A Language that makes Big Data Processing Easy and Tutorial: develop U-SQL scripts using Data Lake Tools for Visual Studio
  • Azure HDInsight is now included as part of the Azure Data Lake.  This integrates the fully managed Apache Hadoop cluster service into the Azure Data Lake Store.  It was also announced the general availability of HDInsight on Linux with an industry-leading 99.9% uptime SLA is available now
  • Azure Data Lake Tools for Visual Studio, provide an integrated development environment that spans the Azure Data Lake, dramatically simplifying authoring, debugging and optimization for processing and analytics at any scale.  With this tool you can write U-SQL, and you can also write U-SQL in the Azure management portal.  See Azure Data Lake: Making Big Data Easy

Note that Hortonworks, Cloudera, and MapR will integrate with the Azure Data Lake store, as well as other partners.

928Pic2_png-550x0

The services are now in private preview and interested developers can sign up here.

More info:

Microsoft expands Azure Data Lake to unleash big data productivity

Announcing General Availability of HDInsight on Linux + new Data Lake Services and Language

About James Serra

James is a big data and data warehousing solution architect at Microsoft. Previously he was an independent consultant working as a Data Warehouse/Business Intelligence architect and developer. He is a prior SQL Server MVP with over 25 years of IT experience.
This entry was posted in Azure, Data Lake, HDInsight, SQLServerPedia Syndication. Bookmark the permalink.

2 Responses to Azure Data Lake enhancements

  1. Mehmet says:

    Hi James,

    How does U-SQL compare to Polybase in terms of capability? Why would we use one over the other?

    Thanks

    Mehmet

  2. Pingback: Why use a data lake? | James Serra's Blog