Another Microsoft event and another bunch of exciting announcements. At the Microsoft Build event last week, the major announcements in the data platform and AI space were:
Machine Learning Services enhancements
In Private Preview is a new visual interface for Azure Machine Learning adds drag-and-drop workflow capabilities to Azure Machine Learning service. It simplifies the process of building, testing, and deploying machine learning models for customers who prefer a visual versus coding experience. This integration brings the best from ML Studio and AML service together. The drag-n-drop experience let any data scientist can quickly build a model without coding. The tool also gives enough flexibility for data scientist to fine tune its model. The AML service as the backend platform offer all the scalability, security, debuggability … that ML studio can’t give. The easy deployment capability in visual interface enables easy generation of score.py file and creating images. With a few clicks, a trained model can be deployed to any AKS cluster associated with AML service. Think of this as Azure Machine Learning Studio V2. More info
Create, run, and explore automated machine learning experiments in the Azure portal without a single line of code. Automated machine learning automates the process of selecting the best algorithm to use for your specific data, so you can generate a machine learning model quickly. More info
Azure Cognitive Services enhancements
The launch of Personalizer, along with Anomaly Detector and Content Moderator, are part of the new Decision category of Cognitive Services that provide recommendations to enable informed and efficient decision-making for users. Available now in preview. More info
Wrangling Data Flows in Azure Data Factory
A new capability called Wrangling Data Flows, available in preview, gives users the ability to explore and wrangle data at scale. Wrangling Data Flows empowers users to visually discover and explore their data without writing a single line of code. It is described at https://mybuild.techcommunity.microsoft.com/sessions/76997?source=speakerdetail#top-anchor (at 21 min). It is what you use with PowerQuery Online. The execution runtime underneath is Mapping Dataflows but the user interface is PowerQuery UI. More info
Azure Database for PostgreSQL Hyperscale (Citus)
Now in Public Preview, this brings high-performance scaling to PostgreSQL database workloads by horizontally scaling a single database across hundreds of nodes to deliver blazingly fast performance and scale. With horizontal scale-out, you can fit more data in-memory, parallelize queries across hundreds of nodes, and index data faster. The addition of Hyperscale (Citus) as a deployment option for Azure Database for PostgreSQL simplifies your infrastructure and application design, saving organizations time to focus on their business needs. More info
Azure SQL Database Hyperscale
Now Generally Available. More info
Azure SQL Database Serverless
Azure SQL Database serverless is a new compute tier that optimizes price-performance and simplifies performance management for databases with intermittent, unpredictable usage. Serverless automatically scales compute for single databases based on workload demand and bills for compute used per second. Serverless databases automatically pause during inactive periods when only storage is billed and automatically resume when activity returns. Serverless helps customers focus on building apps faster and more efficiently. with all databases in SQL Database, serverless databases are always up-to-date, highly available, and benefit from built-in intelligence for further security and performance optimization.
Azure SQL Data Warehouse’s support for semi-structured data
Azure SQL Data Warehouse now supports semi-structured data. Now with one service, both structured and semi-structured data formats (like JSON) can now be analyzed directly from the data warehouse for faster insights. In private preview. More info
Azure Data Explorer enhancements
The new Spark connector will enable customers to seamlessly read data from Azure Data Explorer into a Spark Dataframe as well as ingest data from it. This enables a number of use cases related to data transformation and machine learning in Spark and the ability to use Azure Data Explorer as a data source/destination for interactive analytics or to operationalize machine learning models for fast scoring of data arriving in Azure Data Explorer.
Continuous data export: This feature writes CSV or Parquet files to the data lake, as data streams in via event hubs, IoT hubs, or any other path. It enables building analytical solution over fresh data and seamlessly fill the data lake.
The ability to query data in the lake in its natural format using Azure Data Explorer. Simply define this data as an external table in Azure Data Explorer and query it. You can then join/union it with more data from Azure Data Explorer, SQL servers, and more.
Autoscale for Azure HDInsight
The Autoscale capability for Azure HDInsight is now available in public preview. Autoscale automatically scales your Spark, Hive, or MapReduce HDInsight clusters up or down based on load or a pre-defined schedule.
- Load-based Autoscale uses several workload-specific metrics, such as CPU and memory usage, to intelligently scale your cluster between user-configurable min and max sizes based on the load
- Schedule-based Autoscale allows you to set your own custom schedule for cluster scaling. For example, you can set your cluster to scale up to 10 nodes starting at 9:00 am and scale back down to 3 nodes at 9:00 pm on working days
The Autoscale capability can be configured using the Azure portal or programmatically using ARM templates.