Just announced is the Microsoft Azure Data Catalog, which is an enterprise metadata catalog / portal for the self-service discovery of data sources. It becomes available on Monday next week, July 13, 2015. Check out this short video on it. My response to this is – woo hoo! I have been waiting years for Microsoft to come up with a tool to catalog metadata and I’m excited this day has finally arrived.
From the Microsoft blog post announcement:
Businesses of every size face the challenge of sifting through their myriad data sources and discovering the right ones for a given problem. Although businesses collect and store tons of data as part of their everyday activities, too often they fail to reap the full benefit of all the data that’s being gathered. Employees too often end up spending more time searching for data than they actually do working with the data itself.
To address these problems, Azure Data Catalog uses a crowdsourced approach. Any user, for instance an analyst, data scientist or data developer, can register, enrich, discover, understand and consume data sources. Every user is empowered to register the data sources that they use. Registration extracts the structural metadata from the data source and stores it in the cloud-based Catalog, while the data itself remains in the data source.
Crowdsourced annotations let users who are knowledgeable about the data assets registered in the Catalog to enrich the system at any time. This helps others understand the data more readily, including its intended purpose and how it’s being used within the business.
Azure Data Catalog also lets users discover data sources by searching and filtering. Users can then connect to data sources using any tool of their choice, and they can similarly work with the data that they need using the tools with which they are already familiar.
Azure Data Catalog bridges the gap between IT and the business – it encourages the community of data producers, data consumers and data experts to share their business knowledge while still allowing IT to maintain control and oversight over all the data sources in their constantly evolving systems.
There will be open API’s to allow 3rd parties to integrate directly with the Data Catalog for both registration and discovery of data sources. There will be both a free and standard edition (the free version has limits on the max number of users and max number of catalog objects).
The Azure Data Catalog is an evolution of the existing Data Catalog that today ships as a feature of Power BI for Office 365. Soon the two catalogs will merge into a single service. The exact timings and details for how the Azure Data Catalog might integrate into the Power BI service and Power BI Designer is still to be decided.