The need for having both a DW and cubes

I have heard some people say if you have a data warehouse, there is no need for cubes (when I say “cubes” I am referring to tabular and multidimensional OLAP models). And I have heard others say if you have OLAP cubes, you don’t need a data warehouse. I strongly disagree with both these statements, as almost all the customers I see that are building a modern data warehouse use both in their solutions. Here are some reasons for both:

Why have a data warehouse if you can just use a cube? (i.e. is the trade-off of duplication of data and more ETL worth it?)

Breaking down complex steps so easier to build cube
Cube is departmental view (cube builder not thinking enterprise solution)
Easier to clean/join/master data in data warehouse
Processing cube is slow against sources
One place to control data for consistency and have one version of the truth
Use by tools that need relational format
Cube does not have all data
Cube may be behind in data updates (needs processing)
Data warehouse is place to integrate data
Risk of having multiple cubes doing same thing
Data warehouse keeps historical records
Easier to create data marts from data warehouse
Need a relational database to track Slowly Changing Dimensions (SCD)
Security in a data lake is file/folder
Use the data warehouse as a place to store the metadata since people are more familiar with a relational database

Why do I also need a cube if I have a data warehouse? (or, reasons to report off cubes instead of the data warehouse) (a summary from my prior blog post of Why use a SSAS cube?):

Semantic layer
Handle many concurrent users
Aggregating data for performance
Multidimensional analysis
No joins or relationships
Hierarchies, KPI’s
Row-level Security
Advanced time-calculations
Slowly Changing Dimensions (SCD)
Required for some reporting tools

The typical architecture I see looks like this:

Comments

The need for having both a DW and cubes — 10 Comments

Pingback:#BI101: The need for having both a DW and cubes – Welcome to Ray Kim's 'blog
Simon on July 20, 2018 at 12:04 pm said:

Hi James, what would be the limitations of using normal Azure SQL (obviously without polybase) instead of DW?

DW isn’t cost effective for smaller orgs.

Reply ↓
- James Serra on July 22, 2018 at 3:39 pm said:
  
  Hi Simon,
  
  If you database is “small” (under 4TB) and you don’t need the power of a MPP solution, then go with Azure SQL Database. And with Managed Instance in public preview and near 100% SQL compatibility, it’s a great option with it’s biggest limitation is that SQL DW can be 20x-50x times faster.
  
  Reply ↓
  - Simon Nuss on July 27, 2018 at 7:34 am said:
    
    Awesome, done! Preview request is pending.
    
    Azure SQL Database Managed Instance looks like it on parity with SQL Server 2017 but, once again, it excludes Polybase! Why does MS keep excluding it?
    
    They should just replace polybase with Power Query imho. It can federate equally to Hadoop, plus do a million more things.
    
    Reply ↓
Doug Burke on July 20, 2018 at 7:43 pm said:

Excellent commentary;

Let the DW ‘house’ the data and OLAP interpret the data.

Doug

Reply ↓
Andrew Peterson on July 23, 2018 at 12:55 pm said:

James, I’d like to see more exploration into the use of ROLAP, now that the backend databases are getting more sophisticated/faster. If your onsite DW can be supported using SSD drives, ROLAP may offer a faster SSAS cube build out, with only a single data update to the DW. Or online via Azure SQL data warehouse, with MPP, that may offer a great alternative. That provides both a traditional DW with a rapid SSAS build out. And I am thinking multi-dimensional, not tabular. I may be that last holdout, but multi-dimensional still seems far superior when the business needs get complex.

Reply ↓
Raghu on July 25, 2018 at 6:31 pm said:

James,
I see that now we can build AAS TAbular Cubes on top of ADL. In this case what are the scenario you see SQL DW or SQL Azure is needed to provide a Semantic layer with AAS ?

https://azure.microsoft.com/en-us/blog/using-azure-analysis-services-with-azure-data-lake-storage/

Reply ↓
- James Serra on July 26, 2018 at 5:42 pm said:
  
  Hi Raghu,
  
  Those reasons where in the blog under “Why have a data warehouse if you can just use a cube?”. And I’ll add that if you are going against ADL, you will need to make sure the data is cleaned and transformed in the data lake before pushing it to a cube.
  
  Reply ↓
James O'Toole on July 31, 2018 at 8:45 am said:

Thanks for a great commentary. I now have a second source for some of the comments I have been espousing at work. Great graphic!! It clarifies a lot for many.
colleagues.

Reply ↓
Pingback:Power BI: Dataflows | James Serra's Blog