Real-time query access with PDW

The Parallel Data Warehouse (PDW) officially supports Analysis Services as a data source, both the Multidimensional model (ROLAP and MOLAP modes) and the Tabular model (In-Memory and DirectQuery modes).  The big benefit of using ROLAP or DirectQuery is you get real-time query access to the relational data source in PDW (as opposed to the data only up to the last time the cube was processed) and don’t have to process the cube (just make sure to use clustered columnstore indexes on the PDW tables to improve performance).  You create MDX queries when using ROLAP, which get translated to SQL when hitting PDW, and you create DAX queries when using DirectQuery, which also get translated to SQL when hitting PDW.

Keep in mind that the PDW is so fast when using clustered columnstore indexes, that if you have a properly defined star schema you might not even need to use a cube because the results will be returned to the user quickly.  But there are other reasons besides performance as to why you might still want to use a cube (see Why use a SSAS cube?).

An SSAS cube that uses PDW as a data source is just like any other data source that SSAS uses.  Performance is usually fast because of the clustered columnstore indexes, with the only caveat is sometimes the SQL that is generated by DirectQuery to pull data from PDW is not that great (the SQL generated by ROLAP is usually pretty good).

The other thing to note about DirectQuery, which applies to any data source, is you can’t use PerformancePoint or Excel PivotTables with DirectQuery.  This is because MDX queries are not supported for a tabular model in DirectQuery mode, only DAX, so you need to use a DAX client like Power View (PerformancePoint and Excel PivotTables generate MDX queries behind the scenes).  The other limitation with DirectQuery is it does not cache results like ROLAP and there are some unsupported data types (geometry, xml, and nvarchar(max)).  Finally, there are some DAX functions that are not supported in DirectQuery mode and some that might return different results (see Formula Compatibility in DirectQuery Mode) and there are two DAX functions that are not supported (EXACT and REPLACE).  So it seems that ROLAP is the better choice over DirectQuery for many situations.  But if you do go with a tabular model you may want to look into using a hybrid mode (see Tabular query modes: DirectQuery vs In-Memory and Partitions and DirectQuery Mode).  Definitely go with a DirectQuery tabular model over a in-memory model if your database is 1TB or more.

One limit of ROLAP to note is it does not support parent-child hierarchies (which generate recursive CTE queries which PDW does not yet support, so you will have to convert your parent child hierarchies into level based hierarchies (flattened) in the cube via this tool).  One improvement is Distinct Count performance for ROLAP queries is faster if you enable an optimization.  Some other ROLAP limitations against PDW:

  • Auto-cube refresh is not supported
  • Materialized views, also called Indexed views, are not supported
  • Proactive caching is supported only if you use the polling mechanisms provided by Analysis Services
  • Writeback is not supported

Some things I have learned when using ROLAP against a PDW:

  • Sometimes it is better to have your fact tables as ROLAP, but keep the dimensions as MOLAP
  • Think about using MOLAP for your historical partitions and ROLAP for just your current partition
  • Make sure the measures are in BIGINT in the fact tables or MDX aggregates might not work (MDX aggregates use INT by default unless the source is BIGINT)
  • PDW supports hundreds of concurrent users, but if you have a thousands of concurrent users hitting the cube it may be better to move the data from the PDW to a SMP data mart and create the cube there

More info:

Comparing DirectQuery and ROLAP for real-time access

Tabular model: Not ready for prime time?

PDW and SSAS

Parallel Data Warehouse (PDW) and ROLAP

Analysis Services ROLAP for SQL Server Data Warehouses

Columnstore vs. SSAS

About James Serra

James currently works for Microsoft specializing in big data and data warehousing using the Analytics Platform System (APS), a Massively Parallel Processing (MPP) architecture. Previously he was an independent consultant working as a Data Warehouse/Business Intelligence/MDM architect and developer, specializing in the Microsoft BI stack. He is a SQL Server MVP with over 25 years of IT experience.
This entry was posted in PDW, SQLServerPedia Syndication, SSAS. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>