Microsoft Build event announcements

Another Microsoft event and another bunch of exciting announcements.  At the Microsoft Build event this week, the major announcements in the data platform space were:

Azure Cosmos DB

Azure Cosmos DB is the next big leap in the evolution of DocumentDB.  Cosmos DB is Microsoft’s globally-distributed, horizontally scalable, multi-model database service.  It’s mission is to enable you to write highly scalable, globally distributed apps, easily.  With its turnkey support for global distribution, Azure Cosmos DB seamlessly makes your data available close to where your users are, anywhere around the world; it offers guaranteed low latency, well-defined consistency and high availability around the globe.  It allows you to elastically scale throughput and storage anywhere in the world, based on your needs, and offers a multitude of well-defined consistency models, data models and APIs – so you can select the right ones for your app.

To clear things up, it’s not a “new” product, but rather a renaming of DocumentDB with some additional new features.  Microsoft has transitioned all existing DocumentDB customers and their data to Azure Cosmos DB for no additional charge.  It now natively supports four multiple data models: key-value (new), documents, graphs (new), and columnar.  It also supports many APIs for accessing data including MongoDB and DocumentDB SQL for document model support, Gremlin (preview) for graph model support, and Azure Tables (preview) for key-value model support.  Since it now supports more than just the document model, it would not of made sense to keep the name as DocumentDB, hence the new name.

Microsoft also announced a new consistency level, Consistent Prefix, so that replicas can only move forward in time, as opposed to converging forward in time.  This brings it to a total of five consistency levels developers can use to help unblock programming challenges and binary tradeoffs to better navigate the CAP theorem.  Also introduced are some major improvements to the query engine, which manifests itself as a 50-400% Request Units (RU) reduction per query.

All Azure Table storage accounts will be automatically upgraded to Azure Cosmos DB accounts too, and gain these great new capabilities – including global distribution, automatic indexing, dedicated throughput, and low latency.

For more info, see Azure Cosmos DB: The industry’s first globally-distributed, multi-model database service, A technical overview of Azure Cosmos DBWelcome to Azure Cosmos DB.

Azure Database Migration Service (DMS)

Microsoft announced (See Azure Database Migration Service announcement at //build) a limited preview of the Azure Database Migration Service which will streamline the process for migrating on-premises databases to Azure.

Using this new database migration service simplifies the migration of existing on-premises SQL Server, Oracle, and MySQL databases to Azure, whether your target database is Azure SQL Database, Azure SQL Database Managed Instance or Microsoft SQL Server in an Azure virtual machine.

The automated workflow with assessment reporting, guides you through the necessary changes prior to performing the migration.  When you are ready, the service will migrate the source database to Azure.  For an opportunity to participate in the limited preview of this service, sign up.

Think of this as similar to the SQL Server Migration Assistant (SSMA), except this is an Azure PaaS so there is no VMs to create or software to install.

Azure Database for MySQL and PostgreSQL

Microsoft announced (See Microsoft extends Azure managed database services with introduction of MySQL and PostgreSQL) the preview of managed database services with Azure Database for MySQL and Azure Database for PostgreSQL.

These services are built on the intelligent, trusted and flexible Azure relational database platform. This platform extends similar managed services benefits, global Azure region reach, and innovations that currently power Azure SQL Database and Azure SQL Data Warehouse services to the MySQL and PostgreSQL database engines. Starting at preview, customers can use the service to build and deploy their applications using MySQL version 5.6/5.7 and PostgreSQL version 9.5/9.6 in 11 regions across US, Europe, Asia and Japan.

To get started at Azure Database for MySQL and Azure Database for PostgreSQL.

More info:

Microsoft’s New Azure Database Offerings Challenge (and Maybe Surpass) AWS Cloud

Inside Microsoft’s Cosmos DB

Posted in SQLServerPedia Syndication | 2 Comments

Power BI Premium, Report Server, Apps and API

Announced today are some really cool new Power BI features:

Power BI Premium

Previously available were two tiers, Power BI Free and Power BI Pro ($10/user/month).  The problem with Power BI Pro is that for large organizations, this can add up.  In addition, their performance needs might not be met.  Power BI Premium, which is an add-on to Power BI Pro, addresses the concern about cost and scale.

For costs, it allows an unlimited number of users since it is priced by aggregate capacity (see Power BI Premium calculator).  Users who need to create content in Power BI will still require a $10/month Power BI Pro seat, but there is no per-seat charge for consumption.

For scale, it runs on dedicated hardware giving capacity exclusively allocated to an organization for increased performance (no noisy neighbors).  Organizations can choose to apply their dedicated capacity broadly, or allocate it to assigned workspaces based on the number of users, workload needs or other factors—and scale up or down as requirements change.

There will be changes to the Power BI’s free tier.  Users of the free tier will now be able to connect to all of the data sources that Pro users can connect to, including those available through the on-premises data gateway, and their storage quota will increase from 1GB to 10GB.  The data refresh maximum increases from once daily to 8 per day (hourly-based schedule), and streaming data rates increase from ten thousand rows per hour to one million rows per hour.

For Power BI Premium, you get 100TB of storage and a data refresh maximum of 48 per day (minute-based schedule that can be refreshed every 30 minutes).

Upcoming features for Power BI Premium include the ability to incrementally refresh the data so that only the newest data from the last day (or hour) is loaded into Power BI, pinning datasets to memory, dedicated data refresh nodes, read-only replicas, and geographic distribution (see Microsoft Power BI Premium Whitepaper for more info).  Also, the dataset size cached limit will eventually be removed (it is 1GB in Power BI Pro), so you will be able to build models as large as the Power BI Premium dedicated capacity memory can hold (currently 50GB).

Users of free tier will no longer be able to share their reports and dashboards with other users.  Peer-to-peer dashboard sharing, group workspaces (now called app workspaces), export to PowerPoint, export to CSV/Excel, and analyze in Excel with Power BI apps are capabilities limited to Power BI Pro.  The rationale for this is that if the scope of a user’s needs are limited to personal use, then no fees should apply, but if the user wishes to share or collaborate with others, those are capabilities that need to be paid for.  For existing users of the free service who have been active within the past year, Microsoft is offering a free, 12-month extended trial of Power BI Pro (more info).

Changes to the Power BI free tier become effective on June 1st.  Power BI premium will be available sometime in the 2nd quarter of calendar 2017, meaning by the end of June.

Power BI Report Server

This is an on-premise deployment option that is part of Power BI Premium.  You use Power BI Desktop to author reports that you can then deploy to Power BI Report Server, keeping everything on-prem.  Power BI Report Server is actually just a superset of SQL Server Reporting Services (SSRS) and includes all Reporting Services capabilities, including operational (RDL) reports.  It delivers the capabilities made available in January 2017 as the Technical Preview for Power BI Reports in SSRS.  If you are a per-core licensee of SQL Server Enterprise Edition and have Software Assurance you will be able to host Power BI reports on-premises without a Power BI Premium subscription.  So organizations can choose Power BI in the cloud, or elect to keep reports on-premises with Power BI Report Server and move to the cloud at their pace.

Power BI Apps

Microsoft is evolving content packs into Power BI apps to improve how users discover and explore insights at enterprise scale.  Available today in preview as part of Power BI Premium, Power BI apps offer a simplified way of deploying and managing a collection of purpose-built dashboards and reports to specific people, groups or an entire organization.  Business users can easily install these apps and navigate them with ease, centralizing content in one place and updating automatically.  This differs from contact packs as once installed, content packs lose their grouped identity.  The end users just see a list of dashboards and reports.  Apps, on the other hand, maintain their grouping and identity even after installation.  This makes it very easy for end users to navigate content over time.  For more info, check out Distribute to large audiences with Power BI apps.

Power BI API

Going away will be Power BI Embedded, which has its own API, somewhat distinct from the Power BI service.  Replacing it is a Power BI API that converges Power BI Embedded with the Power BI service to deliver one API surface.  Existing apps built on Power BI Embedded will continue to be supported.  See How to migrate Power BI Embedded workspace collection content to Power BI.

Conclusion

These changes give converge to three scenarios: personal (with Power BI Desktop and Power BI free), departmental (with Power BI Pro) and Enterprise (with Power BI Premium).  And the beauty is the same two products, Power BI Desktop and the Power BI Service, are used for all three scenarios.

More info:

Microsoft’s Power BI Premium delivers enterprise-grade features and bulk discounts

Introducing Power BI Report Server for on-premises Power BI report publishing

May 3 announcement FAQ

What Does Power BI Premium Mean for You?

On-Premise Power BI VOL. 2

To PowerBI Premium, Or Not To PowerBI Premium…

Power BI Licences Changes–The Good, The Bad and The Why

Power BI Premium. Is It For You or Not?

A closer look at Power BI Report Server

Posted in Power BI, SQLServerPedia Syndication | 15 Comments

Microsoft Data Amp event announcements

Yesterday was the Microsoft Data Amp event where a bunch of very exciting announcements were made:

  • SQL Server vNext CTP 2.0 is now available and the product will be officially called SQL Server 2017:
  • SQL Server R Services in SQL Server 2017 is renamed to Machine Learning Services since both R and Python will be supported.  More info
  • Three new features for Cognitive Services are now Generally Available (GA): Face API, Content Moderator, Computer Vision API.  More info
  • Microsoft R Server 9.1 released: Real time scoring and performance enhancements, Microsoft ML libraries for Linux, Hadoop/Spark and Teradata.  More info
  • Azure Analysis Services is now Generally Available (GA).  More info
  • Microsoft has incorporated the technology that sits behind the Cognitive Services inside U-SQL directly as functions.  U-SQL is part of Azure Data Lake Analytics (ADLA)
  • More Cortana Intelligence solution templates: Demand forecasting, Personalized offers, Quality assurance.  More info
  • A new database migration service will help you migrate existing on-premises SQL Server, Oracle, and MySQL databases to Azure SQL Database or SQL Server on Azure virtual machines.  Sign up for limited preview
  • A new Azure SQL Database offering, currently being called Azure SQL Managed Instance (final name to be determined):
    • Migrate SQL Server to SQL as a Service with no changes
    • Support SQL Agent, 3-part names, DBMail, CDC, Service Broker
    • Cross-database + cross-instance querying
    • Extensibility: CLR + R Services
    • SQL profiler, additional DMVs support, Xevents
    • Native back-up restore, log shipping, transaction replication
    • More info
    • Sign up for limited preview

More info:

Delivering AI with data: the next generation of Microsoft’s data platform

Posted in Azure, Azure SQL Database, Cortana Intelligence Suite, SQLServerPedia Syndication | 1 Comment

Using Azure for free

There are a number of options for using Azure for free.  This is very useful for those of you who are not familiar with Azure and want to “play” with it:

Posted in Azure, SQLServerPedia Syndication | 1 Comment

Artificial Intelligence defined

The new buzzword in the industry is “Artificial Intelligence” (AI).  But exactly what is AI and how does it compare to “machine learning” and “deep learning”?  The best definitions I have seen come from the excellent article “Why Deep Learning is Suddenly Changing Your Life”:

AI Terms

So, AI is really an umbrella name and underneath it is categories like machine learning and deep learning.  In the Microsoft world, products that fit into the machine learning include Azure ML and Microsoft R Server.  In the deep learning category are products like speech recognition which is part of Microsoft Cognitive Services and includes other products like face recognition and language translator.  Also available is the Microsoft Cognitive Toolkit (aka CNTK) which is a free, easy-to-use, open-source, commercial-grade toolkit that trains deep learning algorithms to learn like the human brain.

Be aware this is just one of the industry taxonomies, as others would group machine learning under advanced analytics, and deep learning under AI, with the idea that technically, deep learning is simply an improved convolutional neural network method as opposed to something beyond machine learning (as the hype train is driving).

More info:

What’s the Difference Between Artificial Intelligence, Machine Learning, and Deep Learning?

Posted in SQLServerPedia Syndication | 2 Comments

Microsoft database migration tools

There are various Microsoft tools that you can use to help you migrate your database (updated 8/9/17):

Data Migration Assistant (DMA) – enables you to upgrade to a modern data platform by detecting compatibility issues that can impact database functionality on your new version of SQL Server and Azure SQL Database.  It recommends performance and reliability improvements for your target environment.  It allows you to not only move your schema and data, but also uncontained objects from your source server to your target server.  Download version 3.1

Database Experimentation Assistant (DEA) – is a new A/B testing solution for SQL Server upgrades.  It will assist in evaluating a targeted version of SQL for a given workload.  Customers who are upgrading from previous SQL Server versions (SQL Server 2005 and above) to any new version of the SQL Server will be able to use these analysis metrics provided, such as queries that have compatibility errors, degraded queries, query plans, and other workload comparison data, to help them build higher confidence, making it a successful upgrade experience.  Download version 2.1

In short, customers will be able to access and upgrade their databases using DMA, and validate target database’s performance using DEA.

SQL Server Migration Assistant (SSMA) – version 7.5 for Oracle, MySQL, SAP ASE (formerly SAP Sybase ASE), DB2 and Access lets users convert database schema to Microsoft SQL Server schema, upload the schema, and migrate data to the target SQL Server or Azure SQL Database or Azure SQL Data Warehouse (Oracle only).

Azure Database Migration Service (DMS) – this new database migration service simplifies the migration of existing on-premises SQL Server, Oracle, and MySQL databases to Azure, whether your target database is Azure SQL Database, Azure SQL Database Managed Instance or Microsoft SQL Server in an Azure virtual machine.  The automated workflow with assessment reporting, guides you through the necessary changes prior to performing the migration.  When you are ready, the service will migrate the source database to Azure.  Think of this as similar to the SQL Server Migration Assistant (SSMA) and the Data Migration Assistant (DMA), except this is an Azure PaaS so there is no VMs to create or software to install.  For an opportunity to participate in the limited preview of this service, sign up.  For more info see Azure Database Migration Service now available for preview.

Also worth mentioning is in Azure Data Factory, where you can use the Copy Activity to copy data (not schema) of different shapes from various on-premises and cloud data sources to Azure.  See Move data by using Copy Activity.

And for those of you looking to migrate data to Azure Cosmos DB (formally called DocumentDB), we have a data migration tool for that.  It can migration from JSON files, MongoDB, SQL Server, CSV files, Azure Table storage, Blob, Amazon DynamoDB, HBase, Azure Cosmos DB collections.  See How to import data into Azure Cosmos DB for the DocumentDB API?.

And a few calculators to help you move from on-prem to the cloud: Total Cost of Ownership (TCO) Calculator and Azure SQL Database DTU Calculator and Azure pricing calculator.

Finally, one of our partners, Attunity, has a product called Attunity Replicate for Microsoft Migrations that is a special offering for Microsoft customers to facilitate the migration from a variety of popular commercial and open-source databases to the Microsoft data platform.

Make sure to checkout the Microsoft Database Migration Guide.

More info:

PASS Summit Announcements: DMA/DEA

Migrate from on-prem SQL server to Azure VM IaaS

Migrate from on-prem SQL server to Azure SQL Database

Getting data into Azure Blob Storage

Azure SQL Database pricing

Posted in SQL Server, SQLServerPedia Syndication | 3 Comments

U-SQL Defined

Azure Data Lake Analytics (ADLA) is a distributed analytics service built on Apache YARN that allows developers to be productive immediately on big data.  This is accomplished by submitting a job to the service where the service will automatically run it in parallel in the cloud and scale to process data of any size.  Scaling is achieved by simply moving a slider, being careful to make sure the data and job is large and complex enough to provide parallelism so you don’t overprovision and pay too much.  When the job completes, it winds down resources automatically, and you only pay for the processing power used.  This makes it easy to get started quickly and be productive with the SQL or .NET skills you already have, whether you’re a DBA, data engineer, data architect, or data scientist.  Because the analytics service works over both structured and unstructured data, you can quickly analyze all of your data – social sentiment, web clickstreams, server logs, devices, sensors, and more.  There’s no infrastructure setup, configuration, or management.

Included with ADLA is a new language called U-SQL, which is a big data language that seamlessly unifies the ease of use of SQL with the expressive power of C#.  U-SQL’s scalable distributed query capability enables you to efficiently analyze data in the Azure Data Lake Store and across Azure Blob Storage, SQL Servers in Azure, Azure SQL Database and Azure SQL Data Warehouse.  U-SQL is built on the learnings from Microsoft’s internal experience with SCOPE and existing languages such as T-SQL, ANSI SQL, and Hive.  See Introducing U-SQL – A Language that makes Big Data Processing Easy and Tutorial: develop U-SQL scripts using Data Lake Tools for Visual Studio.  Note that U-SQL differs in some ways from ANSI SQL or T-SQL (see Common SQL Expressions in U-SQL).

Perhaps the best value proposition of U-SQL is that it allows you to query data where it lives instead of having to copy all the data to one location.  For external systems, such as Azure SQL DB/DW and SQL Server in a VM, this is achieved using federated queries against those data sources where the query is “pushed down” to the data source and executed on that data source, with only the results being returned.

USQL

Some of the main benefits of U-SQL:

  • Avoid moving large amounts of data across the network between stores (federated query/logical data warehouse)
  • Single view of data irrespective of physical location
  • Minimize data proliferation issues caused by maintaining multiple copies
  • Single query language for all data
  • Each data store maintains its own sovereignty
  • Design choices based on the need
  • Push SQL expressions with filters and joins to remote SQL sources.  There are two approaches:
    • SELECT * FROM EXTERNAL MyDataSource EXECUTE @”Select CustName from Customers WHERE ID=1”; Use this approach when you want exact T-SQL semantics and just want to get the result back.  Note that we are not federating any subsequent U-SQL against the result of this into the remote data source.  Thus, this is called remote queries
    • SELECT CustName FROM EXTERNAL MyDataSource LOCATION “dbo.Customers” WHERE ID=1; Use this approach when you want to write all in U-SQL and are fine with the possible slight semantic differences.  In that case we will accumulate all U-SQL predicates against that location source and translate them into T-SQL based on REMOTABLE TYPES and the U-SQL to T-SQL translation.  That is called federated queries
  • Can access JSON in Blob/ADLS (via JSON extractor), text, CSV, TSV, and images (jpeg).  Customers can also write their own custom extractors

You may have noticed that U-SQL is similar to PolyBase (see PolyBase use cases clarified).  The main difference between the two is that PolyBase extends T-SQL onto unstructured data (files) via a schematized view that allows writing T-SQL against these files, while U-SQL natively operates on unstructured data and virtualizes access to other SQL data sources via a built-in EXTRACT expression that allows you to schematize unstructured data on the fly without having to create a metadata object for it.  Also, PolyBase runs interactively while U-SQL runs in batch, meaning you can use PolyBase with reporting tools such as Power BI, but currently cannot with U-SQL.  Finally, U-SQL supports more formats (i.e. JSON) and allows you to use inline C# functions, User-Defined Functions (UDF), User- Defined Operators (UDO), and User-Defined Aggregators (UDAGG), which are ways to add user-specific code written in C#.

More info:

U-SQL Tables

CREATE DATA SOURCE (U-SQL)

U-SQL SELECT Selecting from an External Rowset

Setup Azure Data Lake Analytics federated U-SQL queries to Azure SQL Database

Tutorial: Get started with Azure Data Lake Analytics U-SQL language

Video Bring Big Data to the masses with U-SQL

Video U-SQL – A new language to process big data using C#/SQL

Posted in Azure Data Lake, PolyBase, SQLServerPedia Syndication | 2 Comments

PolyBase use cases clarified

I previously talked about PolyBase and its enhancements (see PASS Summit Announcements: PolyBase enhancements).  There is some confusion on PolyBase use cases as they are different depending on whether you are using PolyBase with Azure SQL Data Warehouse (SQL DW) or SQL Server 2016, as well as the sources you are using it against.  The three main use cases for using PolyBase are: Loading data, federating querying, and aging out data.  Here is the support for those three uses cases in SQL DW and SQL Server 2016:

PolyBase in: Parallelize Data Load (Blob and ADLS) Federated Query (push down) HDInsights Federated Query (push down) HDP/Cloudera (local or blob) Federated Query (push down) five new sources* Age Out Data
SQL DW Yes N/A N/A No support for on-prem sources Maybe
SQL Server 2016 Yes via scale-out groups.  Blob, not ADLS N Y (Creates MapReduce job) Y Maybe

* = Teradata, Oracle, SQL Server, MongoDB, generic ODBC (Spark, Hive, Impala, DB2)

For federated queries: “N” requires all data from the source to be copied into SQL Server 2016 and then filtered.  For “Y”, the query is pushed down into the data source and only the results are returned back, which can be much faster for large amounts of data.

I mention “Maybe” for age out data in SQL DW as you can use PolyBase to access the aged-out data in blob or Azure Data Lake Storage (ADLS), but it will have to import all the data so may have slower performance (which is usually ok for accessing data that is aged-out).  For SQL Server 2016, it will have to import the data unless you use HDP/Cloudera, in which case the creation of the MapReduce job will add overhead.

Here are details on what PolyBase supports for each product:

PolyBase (works with)
Azure Blob Store (WASB)
Push Down
Azure Data Lake Store (ADLS)
Push Down
HDI
Push Down
Cloudera (CDH)
Push Down
Horton Works (HDP)
Push Down
SQL 2016
Yes
N/A
No
N/A
No
No
Yes
Yes
Yes
Yes
Azure SQL DW
Yes
N/A
Yes
N/A
No
No
No
No
No
No
APS
Yes
N/A
No
N/A
Yes
Yes (internal region)
No (external)
Yes
Yes
Yes
Yes

Here are some important notes:

  • The file types that PolyBase supports: UTF-8 and UTF-16 encoded delimited text, RC File, ORC, Parquet, gzip, zlib, Snappy.  Not supported: extended ASCII, fixed-file format, WinZip, JSON, and XML
  • Azure SQL Database does not support PolyBase
  • SQL DW recently added PolyBase support for ADLS but does not support compute pushdown
  • ADLS in only in two regions (East US 2, Central US)
  • PolyBase supports row sizes up to 1MB
  • PolyBase can do writes to blob/ADLS and HDFS (using CETAS)
  • PolyBase requires the CREATE EXTERNAL TABLE command
  • PolyBase offers ability to create statistics on tables (but they are not auto-created or auto-updated)

PolyBase parallelized reads for data loading:

  • Supported: in SQL using CTAS or INSERT INTO
  • Not supported: BCP, Bulk Insert, SQLBulkCopy
  • Not supported: SSIS (unless used to call stored procedure containing CTAS or use the Azure SQL DW Upload Task)
  • Supported: ADF
    • If source compatible with PolyBase, will directly copy
    • If source not compatible with PolyBase, will stage to Blob
    • If source is ADLS, will still stage to Blob (having to stage to blob will be removed week of 8/20, so PolyBase will copy from ADLS directly to target)

The bottom line is, for SQL DW, think of PolyBase as a mechanism for data loading.  For SQL Server 2016, think of PolyBase for federated querying.

More info:

Azure SQL Data Warehouse loading patterns and strategies

PolyBase scale-out groups

PolyBase Queries

Posted in PolyBase, SQLServerPedia Syndication | 3 Comments

SQL DB now supports 4TB max database size

Azure SQL Database (SQL DB) has increased its max database size from 1TB to 4TB at no additional cost.

Customers using P11 and P15 premium performance levels can use up to 4 TB of included storage at no additional charge.  This 4 TB option is currently in public preview in the following regions: US East 2, West US, West Europe, South East Asia, Japan East, Australia East, Canada Central, and Canada East.

When creating a P11/P15, the default storage option of 1TB is pre-selected.  For databases located in one of the supported regions, you can increase the storage maximum to 4TB.  For all other regions, the storage slider cannot be changed.  The price does not change when you select 4 TB of included storage.

For existing P11 and P15 databases located in one of the supported regions, you can increase the maxsize storage to 4 TB.  This can be done in the Azure Portal, in PowerShell or with Transact-SQL.

The 4 TB database maxsize cannot be changed to 1 TB even if the actual storage used is below 1 TB.

Also, available now in preview is a new service tier for Azure SQL Database, called Premium RS, which is designed for IO-intensive workloads where a limited durability guarantee and lower SLA is acceptable in exchange for a cheaper cost.  Target scenarios for Premium RS are developments using in-memory technology, testing high performance workloads, and production analytic workloads.  Premium RS is available for singleton databases and elastic database pools.

More info:

Announcing Azure SQL Database Premium RS, 4TB storage options, and enhanced portal experience

SQL Database options and performance: Understand what’s available in each service tier

Posted in Azure SQL Database, SQLServerPedia Syndication | Comments Off on SQL DB now supports 4TB max database size

Power BI and Excel options for Hadoop

Below I have attempted to list the various options for reporting off of Hadoop (HDInsight, HDP, Cloudera) using Power BI Desktop and Excel.  Some of the data sources prompt you to choose the Data Connectivity mode of either Import or DirectQuery.  For those that don’t you are limited to Import mode:

    • Power BI via Microsoft Azure HDInsight Spark (Import or DirectQuery)  This is in Beta.  This works if HDInsight is using Blob storage or Azure Data Lake Storage (ADLS)
    • Power BI via Microsoft Azure HDInsight (Import).  This works if HDInsight is connected to Blob storage.  If HDInsight is connected to ADLS, use the “Power BI via HDFS” connector instead
    • Power BI via Spark (Import or DirectQuery)  This is in Beta.  This is for HDP/Cloudera and uses a different authentication level then the HDInsight Spark connector.  NOTE: Cloudera recommends against using SparkSQL with CDH.  SparkSQL uses a component called thrift server which is not supported by Cloudera (see here).  For interactive applications they recommend Impala.  There strategy is described here.
    • Power BI via Impala (Import or DirectQuery).  This is in Beta
    • Power BI via SQL DW using PolyBase (Import).  Note that DirectQuery is an option, but not supported with SQL DW so will import the data instead, which could greatly slow query performance
    • Power BI via SQL Server 2016 using PolyBase (Import for HDInsight, DirectQuery creates MapReduce job in cluster for Cloudera/HDP, other sources will import the data instead)
    • Power BI via HDFS (Import).  This uses WebHDFS.
    • Power BI via ODBC (if Hive ODBC, creates Hive job in cluster, so can be slow.  For very fast querying, use Hive LLAP ODBC)
    • Power BI via Azure Data Lake Store

“Import” mode requires all data from Hadoop to be copied into Power BI/Excel and then filtered.  So import will first load your data (“snapshot”) into a local Power BI model and then run reports against that data, which means that interactive visualizations are not live but rather using data from the last snapshot.

With “DirectQuery” mode, the query is passed down into the data source and only the results are returned back (see Use DirectQuery in Power BI Desktop).  So when using DirectQuery you are using live data.  Note that Power BI Q&A and Quick Insights is not available for DirectQuery datasets (see here for other limitations).

EXCEPTION: If in Power BI you use the “Edit Queries” option, it will actually try to push down the query if you choose “Import” mode.  So the difference in this case from DirectQuery mode is that Import will fallback to running locally if it can’t execute part of the query on the server, while DirectQuery will tell you the query isn’t supported.  So an Import can take hours instead of failing fast like DirectQuery will do.

By using PolyBase, you use a CREATE EXTERNAL TABLE to point to the data stored in a Hadoop cluster.  Thereafter, the Hadoop data is seen as a regular table, hiding complexity from the end user.

I also wanted to mention that Power BI can connect to Azure Blob Storage and Azure Data Lake Store (beta), and Excel can connect to Azure Blob Storage.

There are many more data sources that Power BI Desktop can connect to.

Posted in Excel, Hadoop, PolyBase, Power BI, SQLServerPedia Syndication | Comments Off on Power BI and Excel options for Hadoop