Power BI Designer

Long has the question been asked “Which Microsoft tool do I use for dashboards?”.  SSRS, Excel, PowerView, Report Builder and PerformancePoint are all candidates.  But that has all changed, and the future of dashboarding is here: Power BI Designer.

The Power BI Designer is a new companion application for Power BI.  It is a standalone Windows Desktop application that can downloaded from the Power BI site or here.

Note that Power BI Designer is an on-prem version, with a companion tool called Power BI Dashboard that is a web version and is available via a “Introducing Power BI Dashboards (Preview) – Try it now” link on the front of your Power BI site.  Power BI Dashboard can use files created with Power BI Designer.  Think of Power BI Designer as a tool to create the reports, and Power BI Dashboard as the tool to create the dashboard that contains those reports.  However, Power BI Dashboard can create a limited number of reports on its own against other products (GitHub, Marketo, Microsoft Dynamics CRM, Salesforce, SendGrid, Zendesk), as well as against on-prem SSAS (see Using Power BI to access on-premise data), and finally it can connect to an Excel workbook (where data can be automatically refreshed from OneDrive).

Power BI Designer combines Power Query, Power View (a slimed-down version) and the Power Pivot Data Model into a seamless experience that will allow you to build your reports in an offline fashion and then upload them to your Power BI Site via Power BI Dashboard.  When you finish building your report you save it in a new Power BI designer file format called PBIX.


Note that Power BI Designer reports can’t be refreshed after uploaded– must be refreshed manually in Power BI Designer an uploaded again (for now).  But you can edit the design of the report after it has been uploaded.

The Power BI add-ins (Power Query, Power Pivot, Power View, Power Map) for Excel 2013 are still available and you can continue to use that to model your data and build reports.  The Power BI Designer will be another option and also can allow customers with an older version of Office to be able to create reports.

Within the Power BI Designer is a true SSAS tabular data model.  A SSAS tabular instance runs as a child process in local mode.

Users can create personalized dashboards to monitor their most important data.  A dashboard combines on-premises and cloud-born data in a single pane of glass, providing a consolidated view across the organization regardless of where the data lives.

Users can easily explore all their data using intuitive, natural language capabilities and receive answers in the form of charts and graphs. They can also explore data through detailed reports that target specific aspects of their business.  Visuals from these reports can also be pinned to their dashboards for continuous monitoring.  As part of this experience new visualizations have been added including combo charts, filled maps, gauges, tree maps, and funnel charts.

Power BI Dashboard provides “out of the box” connectivity to a number of popular SaaS applications.  In addition to the existing connection with Microsoft Dynamics CRM Online, customers can also connect to their data in Salesforce, Zendesk, Marketo, SendGrid, and GitHub with many more to come in the months ahead.  With an existing subscription to one of these services, customers can login from Power BI.  In addition to establishing a data connection, Power BI provides pre-built dashboards and reports for each of these applications.

Also is a new Power BI connector for SQL Server Analysis Services that allows customers to realize the benefits of a cloud-based BI solution without having to move their data to the cloud.  Customers can now create a secure connection to an “on-premises” SQL Server Analysis Services server from Power BI in the cloud.  When users view and explore dashboards and reports, Power BI will query the on-premise model using the user’s credentials.

There is still a way to go before Power BI Designer is the solution for any and all dashboards you want to create.  It will need to add all the features that are in PerformancePoint, such as KPI’s.  And hopefully there will be a version that will allow dashboards to be deployed to on-prem SharePoint.  But this is a great start and I’m excited to see what is next!

More info:

Power BI Designer Preview

Videos Power BI Designer

New Power BI features available for preview

Getting Started with Power BI Designer

Video Power BI Designer Preview

Unwrapping the Power BI vNext Public Preview

Getting Started with the Power BI Dashboards Public Preview

Posted in Power BI, SQLServerPedia Syndication | 1 Comment

Using Power BI to access on-premise data

The following blog describes how to use certain cloud-based Power BI for Office 365 products (Power View and Q&A) on a Power BI Site that will reference data on-premises.  To do this you will use certain on-premise Power BI products (Power Pivot and Power Query) to gather the data.  Note that you cannot currently upload Power Map reports to a Power BI site (instead use maps in Power View, see Maps in Power View).

This blog applies to on-premise data that resides in SQL Server that is on an Azure VM or an Analytics Platform System (APS), as well as a normal server.  I’ll make note of any differences.

Download and install the Data Management Gateway (DMG) on the server that contains the on-premise data that resides in SQL Server.  If you are using APS, this will have already been done for you on a VM installed with AU3.  This VM will already have the DMG installed which allows a gateway to APS.  Next create a Power BI site if you have not already done so (see Add Power BI sites to your Office 365 site).  On your Power BI site, create a data gateway (see Data Management Gateway Introduction) and note the generated gateway key and then enter that key in the DMG.  This will link your DMG to the Power BI site (a DMG can only be linked to one Power BI site).  Then create a SQL Server data source on your Power BI site that uses the data gateway (see Data Sources via Power BI), enter the connection info to your server, and specify the SQL Server tables to expose. Take note of the created OData data feed URL.

Next you will open an on-premise Excel workbook, go to the Power Query tab, sign in to your Power BI site, and use either “Online Search” or “From Other Sources -> From OData Feed” and choose the table(s) to load (both will create an OData feed connection).  If loading the data into Excel, once loaded select the data and choose “Add to Data Model” on the PowerPivot tab.  Or, load the table directly to a data model.  Then create a Power View report.  Save the Excel workbook and upload it to your Power BI site.  You will now be able to view the Power View report on the Power BI site.  Then use the “Add to Q&A” option on the workbook on the Power BI site, and you will then be able to use Q&A.

Note that you are using data in the Power Pivot model in the workbook you uploaded to Power BI and not the data directly on SQL Server (so the Power View report is hitting Power Pivot, so if data changes in SQL Server, that won’t be reflected in Power View unless you setup the DMG to refresh the data from SQL Server as I explain below).  To refresh the data in the workbook in Power BI from SQL Server, use the “Schedule Data Refresh” option on the workbook in Power BI and create a daily refresh (there is no option to update more than once a day).  To refresh the data more frequently, you must do it manually by going to the setting tab on the “Schedule Data Refresh” page and clicking the button “refresh report now” (the status of the connection must be OK to see this button).  See Schedule data refresh for workbooks in Power BI for Office 365.

Before you use data refresh, be aware that the OData feed connection info that was in the Excel workbook when it was uploaded will not work for a data refresh.  This is because any data sources that are retrieved via Power Query using an OData feed are not supported for data refresh.  So what you need to do is get a proper connection string by importing data from SQL Server using an SQL Server connection: In on-premise Excel, go to the Power Query tab, click “From Database -> From SQL Server Database”.  Enter the connection info (server name) and then Excel will populate the Navigator with names of all the tables.  Then load one of the tables into the data model.  Now go to the Data tab, click Connections, select the properties of the Power Query connection, and on the Definition tab copy the connection string.  Next go to your Power BI site and configure a Power Query data source (see Scheduled Data Refresh for Power Query) where you will paste that connection string.  After adding the Power Query data source, any Excel workbooks that you want to refresh should use a connection to SQL Server (“From Database -> From SQL Server Database”) and not an OData feed (“Online Search” or “From Other Sources -> From OData Feed”).  Once uploaded to Power BI the workbooks can be refreshed as Power BI knows to discover your gateway and data sources.  You won’t need to create another Power Query data source as long as each workbook uses the same raw connection info (server name).

Now in beta for Power BI is a way to access data from the Power BI site in real-time on on-premise SQL Server.  This is accomplished by having Power BI connect to an SSAS tabular model that is on-premise and uses Direct Query mode.  That SSAS tabular model is connected to SQL Server, so a Power View report or Q&A on the Power BI site would be hitting SQL Server in real-time.  Power BI is able to “find” the on-premise SSAS via an installation piece that is similar to the DMG, called the Power BI Analysis Services Connector.  Note that Installing the Power BI Analysis Services Connector and the Data Management Gateway on the same computer is not supported.  Once the Power BI Analysis Services connector is installed and connected to an SSAS server, the tabular model databases for the server instance you configured will appear when users select the “Get Data” option when using Power BI Designer Preview (this is the only place the tabular models can be used).  The connector currently supports Analysis Services 2012 and 2014.  In the future, Analysis Services Multidimensional, SQL Server and other data sources will be added.

Some notes:

The DMG can be used to publish raw data sources (connecting to data sources directly using the server name) or expose them as OData feeds (using OData to find the data source to save users from having to know the server name but with the disadvantage of workbooks the use an OData feed will not be able to be refreshed on a Power BI site).  You will want to create both a raw data source connection as well as a OData feed to the same data source so users can use both features: search for data without needing connection info as well as have the ability to refresh workbooks on the Power BI site.

When using the OData feed URL in Excel, you will be asked for the method you want to use to access it.  Select the Organizational account and enter your Office 365 credentials.  Also make sure you are signed into Excel with the same account (see Power BI – Data Management Gateway).

If you are using an Azure VM for your SQL Server, when creating a data source in Power BI, the set credentials does not work (you get the message “Failed to verify gateway ‘InstanceName1′ status.  Unable to connect to the remote server”).  As a workaround, login to the VM itself and configure the data source.  See Using Power BI Data Management Gateway on Non-Domain Azure VM.

If you are using an Azure VM for your SQL Server you will want to change the computer name of the Azure VM (see Using Power BI DMG on Non-Domain Azure VMs – August 2014 Update) and add endpoints in Azure Management Portal and firewall rules exceptions on your Azure VM (see Using Odata feed on Azure client whose Data Source is configured against DMG on Azure VM and How to Set Up Endpoints to a Virtual Machine).

More info:

How to set Power BI Schedule Data refresh with simple excel file Office 365

Power BI first impressions

Power BI Data Management Gateway 1.2 Changes the Game

Limitations for Power Query OData Feeds in Power BI

When is a Data Management Gateway Needed in Power BI?

Video Deep Dive on the Data Management Gateway in Power BI for connectivity to on premise and hybrid scenarios

Tip! How To Configure Power BI Preview Analysis Services

Posted in PDW/APS, Power BI, SQLServerPedia Syndication | 1 Comment

Analytics Platform System (APS) AU3 released

The Analytics Platform System (APS), which is a renaming of the Parallel Data Warehouse (PDW), has just released an appliance update (AU3), which is sort of like a service pack, except that it includes many new features.  These appliance updates are made available frequently, about every 3-4 months.  Below is what is new in this release:

TSQL compatibility improvements to reduce migration friction from SQL SMP

  • Implements TRY/CATCH block and THROW/RAISERROR/PRINT statements
  • Implements error diagnostic intrinsic functions ERROR_NUMBER, ERROR_MESSAGE, ERROR_SEVERITY, ERROR_STATE, and ERROR_PROCEDURE
  • Implements global variable @@ERROR
  • Implements Transction-state function XACT_STATE
  • Implements system message storage sys.messages DMV
  • Implements Stored Procedures sp_addmessage, sp_altermessage, and sp_dropmessage
  • Implements support for intersect and except queries.

Integrated Azure Data Management Gateway enables Query from Cloud to On Prem through APS

  • Enables Azure users to access on-premises data sources via APS by exposing data as secure OData feeds.
  • Enables PowerBI (including Power Query, Power Maps and Power Q&A) to use on-prem data from PDW and external Hadoop tables.
  • Enables Azure cloud service data mashups to be scaled out via Polybase by querying Hadoop on-premises.
  • Creates a scalable, enterprise-class Data Management Gateway that scales out as queries access more on-prem data sources.

Polybase Recursive Directory Traversal and ORCFile support

  • Enables users to retrieve the content of all subfolders by pointing one single external table to the parent folder. Removes the burden of creating external tables for each subfolders.
  • Enables all Polybase scenarios to run against the file format ORCFiles

Install, Upgrade, and Servicing Improvements

  • Improves operational uptime in maintenance windows by improving our Active Directory reliability model.
  • Reduces end-to-end install time from 9 hours to less than 8 hours.
  • Improves resiliency by reducing issues related to AD VM corruptions, 2-way forest trusts, and CSV/cluster failures.
  • Improves upgrade and setup stability and reliability.

Replatformed to Windows Server 2012 R2 as the core OS for all appliances nodes

  • Enhances stability with fixes and improvements to core fabric systems such as Active Directory, Hyper-V and Storage Spaces, including:
    • Automatically rebuilding of storage spaces
    • Safer virtualization of Active Directory domain controllers.

Replatformed to SQL Server 2014

  • Improves engineering efficiency by moving APS and SQL Server to a common servicing branch, reducing latency in getting bug fixes and support.

More info:

Relational Data Warehouse + Big Data Analytics: Analytics Platform System (APS) Appliance Update 3

Analytics Platform System Appliance Update 3 Documentation and Client Tools

Posted in PDW/APS, SQLServerPedia Syndication | Leave a comment

The Modern Data Warehouse

The traditional data warehouse has served us well for many years, but new trends are causing it to break in four different ways: data growth, fast query expectations from users, non-relational/unstructured data, and cloud-born data.  How can you prevent this from happening?  Enter the modern data warehouse, which is designed to support these new trends . It handles relational data as well as data in Hadoop, provides a way to easily interface with all these types of data through one query model, and can handle “big data” while providing very fast queries (via MPP).

So if you are currently running a SMP solution and are running into these new trends or think you will in the near future, you should learn all you can about the modern data warehouse.  So read on as I try to explain it in more detail:


I like to think of “big data” as “all data”.  It’s not so much the size of the data, but the fact you want to bring in data from all sources, whether that be traditional relational data from sources such as CRM or ERP, or non-relational data from things like web logs or Twitter data.  Having diverse big data can result in diverse processing, which may require multiple platforms.  To simplify things, IT should manage big data on as few data platforms as possible in order to minimize data movement and avoid data synchronization issues as well as avoid having lone silo’s of data.  All of this will work against the “single version of the truth”, so a goal should be to consolidate all data onto one platform.

There will be exceptions to the one platform approach.  As you expand into multiple types of analytics that have multiple big data structures, you will eventually create many types of data workloads.  Because there is no single platform runs all workloads equally well, most data warehouse and analytic systems are trending toward a multi-platform environment.

A result of the workload-centric approach is a move away from the single platform monolith of the enterprise data warehouse (EDW) toward a physically distributed data warehouse environment (DWE), also called the modern data warehouse.  A modern data warehouse consists of multiple data platform types, ranging from the traditional relational and multidimensional warehouse (and its satellite systems for data marts and ODSs) to new platforms such as data warehouse appliances, columnar RDBMSs, NoSQL databases, MapReduce tools, and HDFS.  So a users’ portfolios of tools for BI/DW and related disciplines is fast-growing.  While a multi-platform approach adds more complexity to the data warehouse environment, BI/DW professionals have always managed complex technology stacks successfully, and end-users love the high performance and solid information outcomes they get from workload-tuned platforms.

A unified data warehouse architecture helps IT cope with the growing complexity of their multi-platform environments.  Some organizations are simplifying the data warehouse environment by acquiring vendor-built data platforms that have a unifying architecture that is easily deployed and has expandable appliance configurations, such as the Microsoft modern data warehouse appliance (see Parallel Data Warehouse (PDW) benefits made simple).

An integrated RDBMS/HDFS combo is an emerging architecture for the modern data warehouse.  The trick is integrating RDBMS and HDFS so they work together optimally.  For example, an emerging best practice among data warehouse professionals with Hadoop experience is to manage non-relational data in HDFS (i.e. creating a “data lake“) but process it and move the results (via queries, ETL, or PolyBase) to RDBMSs (elsewhere in the data warehouse architecture) that are more conducive to SQL-based analytics, providing ease-of-access and speed (since Hadoop is batch oriented and not real-time – see Hadoop and Data Warehouses).  So HDFS serves as a massive data staging area for the data warehouse (see Hadoop and Data Warehouses). This exposes the big benefit of Hadoop in that it allows you to store and explore raw data where the actionable insights are not yet discovered, and it is not practical to do up-front data modeling.

This requires new interfaces and interoperability between HDFS and RDBMSs, and it requires integration at the semantic layer, in which all data—even multi-structured, file-based data in Hadoop—looks relational.  The secret sauce that unifies the RDBMS/HDFS architecture is a single query model which enables distributed queries based on standard SQL to simultaneously access data in the warehouse, HDFS, and elsewhere without preprocessing data to remodel or relocate it.  Ideally it should also push down processing of the queries to the remote HDFS clusters.  This is exactly the problem that newer technologies such as PolyBase solve.


A data warehouse appliance will give you the pre-integration and optimization of the components that make up the multi-platform data warehouse.  An appliance includes hardware, software, storage, and networking components, pre-integrated and optimized for warehousing and analytics.  Appliances have always been designed and optimized for complex queries against very large data sets, now they must also be optimized for the access and query of diverse types of big data.  As mentioned before, Microsoft has such an appliance (see Parallel Data Warehouse (PDW) benefits made simple) that has the added benefit of being a MPP scale-out technology.  The performance of an MPP appliance allows you to use one appliance for queries, as opposed to creating “work-arounds” to have acceptable performance: multiple copies of data on multiple servers, OLAP servers, aggregate tables, data marts, temp tables, etc.

Clouds are emerging as platforms and architectural components for modern data warehouses.  One way of simplifying the modern data warehouse environment is to outsource some or all of it, typically to a cloud-based DBMS, data warehouse, or analytics platform.  User organizations are adopting a mix of cloud types (both public and private) and freely mixing them with traditional on-premises platforms.  For many, the cloud is an important data management strategy due to its fluid allocation and reapportionment of virtualized system resources, which can immediately enhance the performance and scalability of a data warehouse (see Should you move your data to the cloud?).  However, a cloud can also be an enhancement strategy that uses a hybrid architecture to future-proof data warehouse capabilities.  To pursue this strategy, look for cloud-ready, on-premises data warehouse platforms that can integrate with cloud-based data and analytic functionality to extend data warehouse capabilities incrementally over time.

More info:

THE MODERN DATA WAREHOUSE: What Enterprises Must Have Today and What They’ll Need in the Future

Evolving Data Warehouse Architectures: From EDW to DWE

Modern Data Warehousing

A Modern Data Warehouse Architecture: Part 1 – Add a Data Lake

Hadoop and a Modern Data Architecture

Video Hadoop: Beyond the Hype

Posted in Big Data, Data warehouse, Hadoop, PDW/APS, SQLServerPedia Syndication | 3 Comments

What is Azure Data Factory?

The Azure Data Factory is a service designed to allow developers to integrate disparate data sources.  It is a platform somewhat like SSIS in the cloud to manage the data you have both on-prem and in the cloud.

It provides access to on-premises data in SQL Server and cloud data in Azure Storage (Blob and Tables) and Azure SQL Database.  Access to on-premises data is provided through a data management gateway that connects to on-premises SQL Server databases.

It is not a drag-and-drop interface like SSIS.  Instead, data processing is enabled initially through Hive, Pig and custom C# activities.  Such activities can be used to clean data, mask data fields, and transform data in a wide variety of complex ways.

You will author your activities, combine them into a pipeline, set an execution schedule and you’re done.  Data Factory also provides an up-to-the moment monitoring dashboard, which means you can deploy your data pipelines and immediately begin to view them as part of your monitoring dashboard.

Within the Azure Preview Portal, you get a visual layout of all of your pipelines and data inputs and outputs.  You can see all the relationships and dependencies of your data pipelines across all of your sources so you always know where data is coming from and where it is going.  You get a historical accounting of job execution, data production status, and system health in a single monitoring dashboard.

Data Factory provides customers with a central place to manage their processing of web log analytics, click stream analysis, social sentiment, sensor data analysis, geo-location analysis, etc.

Microsoft views Data Factory as a key tool for customers who are looking to have a hybrid story with SQL Server or who currently use Azure HDInsight, Azure SQL Database, Azure Blobs, and Power BI for Office 365.

In short, developers can use Data Factory to transform semi-structured, unstructured and structured data from on-premises and cloud sources into trusted information.


More info:

Microsoft takes wraps off preview of its Azure Data Factory service

Data Factory Public Preview – build and manage information production pipelines

Video Azure Data Factory Overview

Introduction to Azure Data Factory Service

The Ins and Outs of Azure Data Factory – Orchestration and Management of Diverse Data

Posted in Cloud, SQLServerPedia Syndication | Leave a comment

Fast Track Data Warehouse Reference Architecture for SQL Server 2014

I have previously blogged about the Fast Track Data Warehouse, a reference configuration optimized for data warehousing (see Microsoft SQL Server Reference Architecture and Appliances).  That was version 3.0 of the reference configuration that was for SQL Server 2008 R2.  Released a couple of years ago was version 4.0 for SQL Server 2012 (download).  Now appearing is Fast Track Data Warehouse solutions for SQL Server 2014.  While a new reference architecture whitepaper is not available yet, these solutions can be ordered now. These are solutions validated by Microsoft for deploying SMP data warehouse on SQL Server 2014 EE:

3 solutions from HP (in collaboration with SanDisk):

HP DL380 Gen8 and HP LE PCIe Workload Accelerator 28TB/45 TB Data Warehouse Fast Track Reference Architecture

HP DL580 Gen 8 and HP LE PCIe Workload Accelerator 90TB Microsoft SQL Server Data Warehouse Fast Track Reference Architecture

One from Lenovo:


One from EMC:

EMC Solutions

More partners are expected soon.

Posted in Data warehouse, Fast Track, SQLServerPedia Syndication | 3 Comments

Should you move your data to the cloud?

Should you move your data to the cloud?  That is the question.  The answer is not simple.  While moving data to the cloud is all the rage, the fact is a large majority of the Fortune 500 companies are not keeping any data in the cloud at all.  At least not yet.  Why is that?  Well, some of those reasons include:

  • Security concerns (potential for compromised information, issues of privacy when data is stored on a public facility, might be more prone to outside security threats because its high-profile, some providers might not implement the same layers of protection you can achieve in-house)
  • Lack of operational control: Lack of access to servers (i.e. say you are hacked and want to get to security and system log files; if something goes wrong you have no way of controlling how and when a response is carried out; the provider can update software, change configuration settings, and allocate resources without your input or your blessing; you must conform to the environment and standards implemented by the provider)
  • Lack of ownership (an outside agency can get to data easier in the cloud data center that you don’t own vs getting to data in your onsite location that you own)
  • Compliance restrictions
  • Regulations (health, financial)
  • Legal restrictions (i.e. data can’t leave your country)
  • Company policies
  • You may be sharing resources on your server, as well as competing for system and network resources

If you can get past most or all of those reasons, the cloud offers many benefits:

  • Fire up a server quickly (abbreviated infrastructure implementation build-out times)
  • Grow as demand is needed (unlimited elastic scale).  Change hardware instantly
  • Reduce as demand lessons (pay for what you need)
  • Don’t need co-location space, so cost savings (space, power, etc)
  • No hardware costs
  • No commitment or long-term vendor lock
  • Allows companies to benefit from changes in the technology impacting the latest storage solutions
  • High availability and disaster recovery done for you
  • More frequent updates to OS, sql server, etc
  • Automatic updates
  • Automatic geography redundancy
  • Really helpful for proof-of-concept (POC) or development projects with a known lifespan

Also, there are some constraints of on-premise data:

  • Scale constrained to on-premise procurement
  • Capex up-front costs, although some companies may prefer this over a yearly operating expense (OpEx)
  • A staff of employees or consultants must be retained to administer and support the hardware and software in place
  • Expertise needed for tuning and deployment

These are just some quick bullet points to get you thinking about the pros and cons of moving to the cloud.  And the answer can be further muddied by adding into the discussion public cloud (Azure) versus private cloud (Cloud Platform System) – a discussion for later!

More info:

The Hybrid Cloud: Having your Cake

To Cloud or Not to Cloud, That is the Question

Posted in Cloud, SQLServerPedia Syndication | 3 Comments

Book: Microsoft SQL Server 2014 Business Intelligence Development Beginner’s Guide

I was fortunate to be a technical reviewer for a new book, SQL Server 2014 Business Intelligence Development by Reza Rad.  Below is the abstract.  It’s a great book for those of you trying to learn about all the various tools in the Microsoft BI stack.  Click here to check it out!


Take advantage of the real power behind the BI components of SQL Server 2014, Excel 2013, and SharePoint 2013 with this hands-on book.  This book will give you a competitive advantage by helping you to quickly learn how to design and build BI system with Microsoft BI tools.  This book starts with designing a data warehouse with dimensional modeling, and then looks at creating data models based on SSAS multidimensional and Tabular technologies.  It will illustrate how to use SSIS for ETL, and MDS and DQS for data governance and data quality.  The book is packed with real-world examples that will give you a good understanding of the BI and DW components of SQL Server 2014, Excel 2013, and SharePoint 2013.

Posted in Business Intelligence, SQLServerPedia Syndication | Leave a comment

Analytics Platform System (APS) AU2 released

The Analytics Platform System (APS), which is a renaming of the Parallel Data Warehouse (PDW), has recently released an appliance update (AU2), which is sort of like a service pack, except that it includes many new features.  These appliance updates are made available frequently, about every 3-4 months.  Below is what is new in this release:

TSQL Compatibility improvements to reduce migration friction from SQL SMP

  • Adds full support for user defined schemas. This is one of the most requested features from customers migrating from SQL Server (CREATE SCHEMA, ALTER USER, GRANT/DENY, etc.)
  • Improves statistics collection capability and supportability by introducing filtered statistics, ability to control the sample size for sampled statistics, and support for DBCC SHOW_STATISTICS and STATS_DATE intrinsics
  • Adds support for a set of 51 mathematical, string ODBC, date/time, data and type analytic built in functions:
  • Enables broader set of 3rd party ISV tools such as SAS and Tableau, as well as unblocking scenarios around managing data and analytical functions within the appliance

HDI Region Support for HDP 2.0 and Yarn

  • Enables users to write custom apps based on new data processing models (beyond MapReduce), enabling new scenarios such as interactive/ad hoc querying or search
  • Improves processing on the same hardware with predictable performance and better quality of service
  • On boards SCOM management pack for HDI, enabling System Center management of PDW+HDI
  • Updates the Microsoft .NET SDK for Hadoop to add value for both APS and Azure HDInsight clusters

Polybase HDP 2.0 and Query Hint Support

  • Enables all Polybase scenarios to work against both HDP 2.0 on Windows and Linux, pushing computation through the Yarn framework
  • Enables query hint support for Polybase push-down, enabling ability to force or disable push-down on a per-query basis

Install, Upgrade, and Servicing Improvements

  • Reduces end-to-end install time from 12 to under 9 hours (single scale unit)
  • Reduces end-to-end install time for virtual appliances from 7.5 to 4.5 hours
  • Reduces end-to-end servicing and patching time from 2 to 1 hour
  • Improves supportability with fast-fail and simplified troubleshooting steps
  • Improves performance by performing parallel and group patching.

Appliance Heartbeat Monitoring

  • Collects critical alerts and appliance heartbeats, storing appliance health data in Azure

Self-host Appliance and Customer Workload

  • On boards key first party workloads by creating APS-based internal data warehouse

Engineering Efficiency Improvements

  • Introduces train model, reducing release cycle to 120 days end-to-end
  • Enables more frequent test passes to reduce bug escape
  • Completely automates virtual appliance allocation

TFS and MSBuild Migration

  • Migrates codebase to Team Foundation Server (TFS) under PDW_Main
  • Converts codebase to MSBuild
Posted in PDW/APS, SQLServerPedia Syndication | 1 Comment

Non-obvious APS/PDW benefits

The Analytics Platform System (APS), which is a renaming of the Parallel Data Warehouse (PDW), has a lot of obvious benefits, which I discuss here.  For those of you who find your database is getting too big, or becoming too slow, or you need to integrate non-relational data, check out APS, Microsoft’s MPP solution.

But there are a lot of non-obvious benefits to using APS that I have listed below:

  • APS is a hardware platform that removes roadblocks for future needs. Be proactive instead of reactive!
  • Numerous capabilities (i.e. PolyBase, mixed workload support, scalability, HDInsight) to get you thinking about better ways to use your data
  • Much quicker development time due to speed of execution
  • Allows removal of ETL complexity (temp tables, aggregation tables, data marts, other band aids and workarounds)
  • Permits elimination of SSAS cubes or switch to ROLAP mode (so real-time data and no cube processing time)
  • Reduction or elimination of nightly maintenance windows. Instead use intra-day batch cycles
  • Tuning, redesigning ETL, upgrading hardware (more memory, Fusion IO cards), etc., for SMP to get 20-50% improvement versus 20-50x improvement with MPP
  • SMP is optimized for OLTP while APS is optimized for data warehouses
  • Allows a clear path to do predictive analytics via tools like Azure ML by having the disk space and processing power
  • Don’t think of it as just a solution for a large data warehouse but rather for any size warehouse that needs faster query performance
  • Faster query performance allows for adding more parameters to reports that also can be run real-time (instead of at night with fixed parameters) so business users can ask more sophisticated questions and execute ad-hoc queries.  This will result in a lot more self-service reporting
  • Have the space and performance to consolidate all your various data warehouses and data marts to one place
  • Decrease infrastructure costs through consolidation: reduced server count, reduction of duplicate data, decreasing licenses, viewer ETL pieces, less infrastructure management costs (hardware, SQL Server, OS)
Posted in PDW/APS, SQLServerPedia Syndication | 3 Comments