Dataflows, previously called Common Data Service for Analytics as well as Datapools, will be in preview soon and I wanted to explain in this blog what it is and how it can help you get value out of your data quickly (it’s a follow-up to my blog Getting value out of data quickly).
In short, Dataflows integrates data lake and ETL technology directly into Power BI, so anyone with Power Query skills (yes – Power Query is now part of Power BI service and not just Power BI Desktop and is called Power Query online) can create, customize and manage data within their Power BI experience (think of it as self-service data prep). Dataflows include a standard schema, called the Common Data Model (CDM), that contains the most common business entities across the major functions such as marketing, sales, service, finance, along with connectors that ingest data from the most common sources into these schemas. This greatly simplifies modeling and integration challenges (it prevents multiple metadata/definition on the same data). You can also extend the CDM by creating custom entities. Lastly – Microsoft and their partners will be shipping out-of-the-box applications that run on Power BI that populate data in the Common Data Model and deliver insights through Power BI.
A dataflow is not just the data itself, but also logic on how the data is manipulated. Dataflows belong to the Data Warehouse/Mart/Lake family. Its main job is to aggregate, cleanse, transform, integrate and harmonize data from a large and growing set of supported on-premises and cloud-based data sources including Dynamics 365, Salesforce, Azure SQL Database, Excel, SharePoint. Dataflows hold a collection of data-lake stored entities (i.e. tables) which are stored in internal Power BI Common Data Model compliant folders in Azure Data Lake Storage Gen2.
This adds two new layers to Power BI (Dataflows and Storage):
But you can instead use your own Azure Data Lake Store Gen2, allowing other Azure services to reuse the data (i.e. Azure Databricks can be used to manipulate the data).
You can also setup incremental refresh for any entity, link to entities from other dataflows, and can pull data down from the dataflows into Power BI desktop.
To use dataflows, in the Power BI Service, under a Workspace: Create – Dataflow – Add entities: This starts online Power Query and you then choose a connector from one of the many data sources (just like you do with Power Query in Power BI Desktop). Then choose a table to import and the screen will look like this:
To create a dashboard from these entities, in Power BI Desktop you simply choose Get Data -> Power BI dataflows.
The bottom line is Power BI users can now easily create a dataflow to prepare data in a centralized storage, using a standardized schema, ready for easy consumption, reuse, and generation of business insights.
Dataflows are a great way to have a power user get value out of data without involving IT. But while this adds enterprise tools to Power BI, it does not mean you are creating an enterprise solution. You still may need to create a data warehouse and cubes: See The need for having both a DW and cubes and Is the traditional data warehouse dead?.