Microsoft database migration tools

There are various Microsoft tools that you can use to help you migrate your database:

Data Migration Assistant (DMA) – enables you to upgrade to a modern data platform by detecting compatibility issues that can impact database functionality on your new version of SQL Server and Azure SQL Database.  It recommends performance and reliability improvements for your target environment.  It allows you to not only move your schema and data, but also uncontained objects from your source server to your target server.  Download version 3.1

Database Experimentation Assistant (DEA) – is a new A/B testing solution for SQL Server upgrades.  It will assist in evaluating a targeted version of SQL for a given workload.  Customers who are upgrading from previous SQL Server versions (SQL Server 2005 and above) to any new version of the SQL Server will be able to use these analysis metrics provided, such as queries that have compatibility errors, degraded queries, query plans, and other workload comparison data, to help them build higher confidence, making it a successful upgrade experience.  Download technical preview 2

In short, customers will be able to access and upgrade their databases using DMA, and validate target database’s performance using DEA.

SQL Server Migration Assistant (SSMA) –  for Oracle, MySQL, SAP ASE (formerly SAP Sybase ASE), DB2 and Access lets users convert database schema to Microsoft SQL Server schema, upload the schema, and migrate data to the target SQL Server.  Download version 7.3

Also worth mentioning is in Azure Data Factory, where you can use the Copy Activity to copy data (not schema) of different shapes from various on-premises and cloud data sources to Azure.  See Move data by using Copy Activity.

More info:

PASS Summit Announcements: DMA/DEA

Migrate from on-prem SQL server to Azure VM IaaS

Migrate from on-prem SQL server to Azure SQL Database

Getting data into Azure Blob Storage

Posted in SQL Server, SQLServerPedia Syndication | 3 Comments

U-SQL Defined

Azure Data Lake Analytics (ADLA) is a distributed analytics service built on Apache YARN that allows developers to be productive immediately on big data.  This is accomplished by submitting a job to the service where the service will automatically run it in parallel in the cloud and scale to process data of any size.  Scaling is achieved by simply moving a slider, being careful to make sure the data and job is large and complex enough to provide parallelism so you don’t overprovision and pay too much.  When the job completes, it winds down resources automatically, and you only pay for the processing power used.  This makes it easy to get started quickly and be productive with the SQL or .NET skills you already have, whether you’re a DBA, data engineer, data architect, or data scientist.  Because the analytics service works over both structured and unstructured data, you can quickly analyze all of your data – social sentiment, web clickstreams, server logs, devices, sensors, and more.  There’s no infrastructure setup, configuration, or management.

Included with ADLA is a new language called U-SQL, which is a big data language that seamlessly unifies the ease of use of SQL with the expressive power of C#.  U-SQL’s scalable distributed query capability enables you to efficiently analyze data in the Azure Data Lake Store and across Azure Blob Storage, SQL Servers in Azure, Azure SQL Database and Azure SQL Data Warehouse.  U-SQL is built on the learnings from Microsoft’s internal experience with SCOPE and existing languages such as T-SQL, ANSI SQL, and Hive.  See Introducing U-SQL – A Language that makes Big Data Processing Easy and Tutorial: develop U-SQL scripts using Data Lake Tools for Visual Studio.  Note that U-SQL differs in some ways from ANSI SQL or T-SQL (see Common SQL Expressions in U-SQL).

Perhaps the best value proposition of U-SQL is that it allows you to query data where it lives instead of having to copy all the data to one location.  For external systems, such as Azure SQL DB/DW and SQL Server in a VM, this is achieved using federated queries against those data sources where the query is “pushed down” to the data source and executed on that data source, with only the results being returned.

USQL

Some of the main benefits of U-SQL:

  • Avoid moving large amounts of data across the network between stores (federated query/logical data warehouse)
  • Single view of data irrespective of physical location
  • Minimize data proliferation issues caused by maintaining multiple copies
  • Single query language for all data
  • Each data store maintains its own sovereignty
  • Design choices based on the need
  • Push SQL expressions with filters and joins to remote SQL sources.  There are two approaches:
    • SELECT * FROM EXTERNAL MyDataSource EXECUTE @”Select CustName from Customers WHERE ID=1”; Use this approach when you want exact T-SQL semantics and just want to get the result back.  Note that we are not federating any subsequent U-SQL against the result of this into the remote data source.  Thus, this is called remote queries
    • SELECT CustName FROM EXTERNAL MyDataSource LOCATION “dbo.Customers” WHERE ID=1; Use this approach when you want to write all in U-SQL and are fine with the possible slight semantic differences.  In that case we will accumulate all U-SQL predicates against that location source and translate them into T-SQL based on REMOTABLE TYPES and the U-SQL to T-SQL translation.  That is called federated queries
  • Can access JSON in Blob/ADLS (via JSON extractor), text, CSV, TSV, and images (jpeg).  Customers can also write their own custom extractors

You may have noticed that U-SQL is similar to PolyBase (see PolyBase use cases clarified).  The main difference between the two is that PolyBase extends T-SQL onto unstructured data (files) via a schematized view that allows writing T-SQL against these files, while U-SQL natively operates on unstructured data and virtualizes access to other SQL data sources via a built-in EXTRACT expression that allows you to schematize unstructured data on the fly without having to create a metadata object for it.  Also, PolyBase runs interactively while U-SQL runs in batch, meaning you can use PolyBase with reporting tools such as Power BI, but currently cannot with U-SQL.  Finally, U-SQL supports more formats (i.e. JSON) and allows you to use inline C# functions, User-Defined Functions (UDF), User- Defined Operators (UDO), and User-Defined Aggregators (UDAGG), which are ways to add user-specific code written in C#.

More info:

U-SQL Tables

CREATE DATA SOURCE (U-SQL)

U-SQL SELECT Selecting from an External Rowset

Setup Azure Data Lake Analytics federated U-SQL queries to Azure SQL Database

Tutorial: Get started with Azure Data Lake Analytics U-SQL language

Video Bring Big Data to the masses with U-SQL

Video U-SQL – A new language to process big data using C#/SQL

Posted in Azure Data Lake, PolyBase, SQLServerPedia Syndication | 2 Comments

PolyBase use cases clarified

I previously talked about PolyBase and its enhancements (see PASS Summit Announcements: PolyBase enhancements).  There is some confusion on PolyBase use cases as they are different depending on whether you are using PolyBase with Azure SQL Data Warehouse (SQL DW) or SQL Server 2016, as well as the sources you are using it against.  The three main use cases for using PolyBase are: Loading data, federating querying, and aging out data.  Here is the support for those three uses cases in SQL DW and SQL Server 2016:

PolyBase in: Parallelize Data Load (Blob and ADLS) Federated Query (push down) HDInsights Federated Query (push down) HDP/Cloudera (local or blob) Federated Query (push down) five new sources* Age Out Data
SQL DW Yes N/A N/A No support for on-prem sources Maybe
SQL Server 2016 Yes via scale-out groups.  Blob, not ADLS N Y (Creates MapReduce job) Y Maybe

* = Teradata, Oracle, SQL Server, MongoDB, generic ODBC (Spark, Hive, Impala, DB2)

For federated queries: “N” requires all data from the source to be copied into SQL Server 2016 and then filtered.  For “Y”, the query is pushed down into the data source and only the results are returned back, which can be much faster for large amounts of data.

I mention “Maybe” for age out data in SQL DW as you can use PolyBase to access the aged-out data in blob or Azure Data Lake Storage (ADLS), but it will have to import all the data so may have slower performance (which is usually ok for accessing data that is aged-out).  For SQL Server 2016, it will have to import the data unless you use HDP/Cloudera, in which case the creation of the MapReduce job will add overhead.

Here are details on what PolyBase supports for each product:

PolyBase (works with)
Azure Blob Store (WASB)
Push Down
Azure Data Lake Store (ADLS)
Push Down
HDI
Push Down
Cloudera (CDH)
Push Down
Horton Works (HDP)
Push Down
SQL 2016
Yes
N/A
No
N/A
No
No
Yes
Yes
Yes
Yes
Azure SQL DW
Yes
N/A
Yes
N/A
No
No
No
No
No
No
APS
Yes
N/A
No
N/A
Yes
Yes (internal region)
No (external)
Yes
Yes
Yes
Yes

Here are some important notes:

  • The file types that PolyBase supports: UTF-8 and UTF-16 encoded delimited text, RC File, ORC, Parquet, gzip, zlib, Snappy.  Not supported: extended ASCII, fixed-file format, WinZip, JSON, and XML
  • Azure SQL Database does not support PolyBase
  • SQL DW recently added PolyBase support for ADLS but does not support compute pushdown
  • ADLS in only in two regions (East US 2, Central US)
  • PolyBase supports row sizes up to 1MB
  • PolyBase can do writes to blob/ADLS and HDFS (using CETAS)
  • PolyBase requires the CREATE EXTERNAL TABLE command
  • PolyBase offers ability to create statistics on tables (but they are not auto-created or auto-updated)

PolyBase parallelized reads for data loading:

  • Supported: in SQL using CTAS or INSERT INTO
  • Not supported: BCP, Bulk Insert, SQLBulkCopy
  • Not supported: SSIS (unless used to call stored procedure containing CTAS or use the Azure SQL DW Upload Task)
  • Supported: ADF
    • If source compatible with PolyBase, will directly copy
    • If source not compatible with PolyBase, will stage to Blob
    • If source is ADLS, will still stage to Blob (having to stage to blob will be removed week of 8/20, so PolyBase will copy from ADLS directly to target)

The bottom line is, for SQL DW, think of PolyBase as a mechanism for data loading.  For SQL Server 2016, think of PolyBase for federated querying.

More info:

Azure SQL Data Warehouse loading patterns and strategies

PolyBase scale-out groups

PolyBase Queries

Posted in PolyBase, SQLServerPedia Syndication | 3 Comments

SQL DB now supports 4TB max database size

Azure SQL Database (SQL DB) has increased its max database size from 1TB to 4TB at no additional cost.

Customers using P11 and P15 premium performance levels can use up to 4 TB of included storage at no additional charge.  This 4 TB option is currently in public preview in the following regions: US East 2, West US, West Europe, South East Asia, Japan East, Australia East, Canada Central, and Canada East.

When creating a P11/P15, the default storage option of 1TB is pre-selected.  For databases located in one of the supported regions, you can increase the storage maximum to 4TB.  For all other regions, the storage slider cannot be changed.  The price does not change when you select 4 TB of included storage.

For existing P11 and P15 databases located in one of the supported regions, you can increase the maxsize storage to 4 TB.  This can be done in the Azure Portal, in PowerShell or with Transact-SQL.

The 4 TB database maxsize cannot be changed to 1 TB even if the actual storage used is below 1 TB.

Also, available now in preview is a new service tier for Azure SQL Database, called Premium RS, which is designed for IO-intensive workloads where a limited durability guarantee and lower SLA is acceptable in exchange for a cheaper cost.  Target scenarios for Premium RS are developments using in-memory technology, testing high performance workloads, and production analytic workloads.  Premium RS is available for singleton databases and elastic database pools.

More info:

Announcing Azure SQL Database Premium RS, 4TB storage options, and enhanced portal experience

SQL Database options and performance: Understand what’s available in each service tier

Posted in Azure SQL Database, SQLServerPedia Syndication | Comments Off on SQL DB now supports 4TB max database size

Power BI and Excel options for Hadoop

Below I have attempted to list the various options for reporting off of Hadoop (HDInsight, HDP, Cloudera) using Power BI Desktop and Excel.  Some of the data sources prompt you to choose the Data Connectivity mode of either Import or DirectQuery.  For those that don’t you are limited to Import mode:

    • Power BI via Microsoft Azure HDInsight Spark (Import or DirectQuery)  This is in Beta.  This works if HDInsight is using Blob storage or Azure Data Lake Storage (ADLS)
    • Power BI via Microsoft Azure HDInsight (Import).  This works if HDInsight is connected to Blob storage.  If HDInsight is connected to ADLS, use the “Power BI via HDFS” connector instead
    • Power BI via Spark (Import or DirectQuery)  This is in Beta.  This is for HDP/Cloudera and uses a different authentication level then the HDInsight Spark connector.  NOTE: Cloudera recommends against using SparkSQL with CDH.  SparkSQL uses a component called thrift server which is not supported by Cloudera (see here).  For interactive applications they recommend Impala.  There strategy is described here.
    • Power BI via Impala (Import or DirectQuery).  This is in Beta
    • Power BI via SQL DW using PolyBase (Import).  Note that DirectQuery is an option, but not supported with SQL DW so will import the data instead, which could greatly slow query performance
    • Power BI via SQL Server 2016 using PolyBase (Import for HDInsight, DirectQuery creates MapReduce job in cluster for Cloudera/HDP, other sources will import the data instead)
    • Power BI via HDFS (Import).  This uses WebHDFS.
    • Power BI via ODBC (if Hive ODBC, creates Hive job in cluster, so can be slow.  For very fast querying, use Hive LLAP ODBC)

“Import” mode requires all data from Hadoop to be copied into Power BI/Excel and then filtered.  So import will first load your data (“snapshot”) into a local Power BI model and then run reports against that data, which means that interactive visualizations are not live but rather using data from the last snapshot.

With “DirectQuery” mode, the query is passed down into the data source and only the results are returned back (see Use DirectQuery in Power BI Desktop).  So when using DirectQuery you are using live data.  Note that Power BI Q&A and Quick Insights is not available for DirectQuery datasets (see here for other limitations).

EXCEPTION: If in Power BI you use the “Edit Queries” option, it will actually try to push down the query if you choose “Import” mode.  So the difference in this case from DirectQuery mode is that Import will fallback to running locally if it can’t execute part of the query on the server, while DirectQuery will tell you the query isn’t supported.  So an Import can take hours instead of failing fast like DirectQuery will do.

By using PolyBase, you use a CREATE EXTERNAL TABLE to point to the data stored in a Hadoop cluster.  Thereafter, the Hadoop data is seen as a regular table, hiding complexity from the end user.

I also wanted to mention that Power BI can connect to Azure Blob Storage and Azure Data Lake Store (beta), and Excel can connect to Azure Blob Storage.

There are many more data sources that Power BI Desktop can connect to.

Posted in Excel, Hadoop, PolyBase, Power BI, SQLServerPedia Syndication | Comments Off on Power BI and Excel options for Hadoop

Azure Data Factory and SSIS compared

I see a lot of confusion when it comes to Azure Data Factory (ADF) and how it compares to SSIS.  It is not simply “SSIS in the cloud”.  See What is Azure Data Factory? for an overview of ADF, and I’ll assume you know SSIS.  So how are they different?

SSIS is an Extract-Transfer-Load tool, but ADF is a Extract-Load Tool, as it does not do any transformations within the tool, instead those would be done by ADF calling a stored procedure on a SQL Server that does the transformation, or calling a Hive job, or a U-SQL job in Azure Data Lake Analytics, as examples.  Think of it more as an orchestration tool.  SSIS has the added benefit of doing transformations, but keep in mind the performance of any transformations depends on the power of the server that SSIS is installed on, as the data to be transformed will be pushed to that SSIS server.  Other major differences:

  • ADF is a cloud-based service (via ADF editor in Azure portal) and since it is a PaaS tool does not require hardware or any installation.  SSIS is a desktop tool (via SSDT) and requires a good-sized server that you have to manage and you have to install SQL Server with SSIS
  • ADF uses JSON scripts for its orchestration (coding), while SSIS uses drag-and-drop tasks (no coding)
  • ADF is pay-as-you-go via an Azure subscription, SSIS is a license cost as part of SQL Server
  • ADF can fire-up HDInsights clusters and run Pig and Hive scripts.  SSIS can also via the Azure Feature Pack for Integration Services (SSIS)
  • SSIS has a powerful GUI, intellisense, and debugging.  ADF has a basic editor and no intellisense or debugging
  • SSIS is administered via SSMS, while ADF is administered via the Azure portal
  • SSIS has a wider range of supported data sources and destinations
  • SSIS has a programming SDK, automation via BIML, and third-party components.  ADF does not have a programming SDK, has automation via PowerShell, and no third-party components
  • SSIS has error handling.  ADF does not
  • ADF has “data lineage“, tagging and tracking the data from different sources.  SSIS does not have this

Think of ADF as a complementary service to SSIS, with its main use case confined to inexpensively dealing with big data in the cloud.

Note that moving to the cloud requires you to think differently when it comes to loading a large amount of data, especially when using a product like SQL Data Warehouse (see Azure SQL Data Warehouse loading patterns and strategies).

More info:

Azure Data Factory vs SSIS

Posted in Azure Data Factory, SQLServerPedia Syndication, SSIS | 4 Comments

IoT Hub vs. Event Hub

There can sometimes be confusion, in IoT scenarios, between IoT Hub and Event Hub, as they can both be valid choices for streaming data ingestion.  My Microsoft colleague, Steve Busby, an IoT Technology Specialist, is here to clear up the confusion:

The majority of the time, if the data is coming directly from the devices, either directly or via a field-based gateway, IoT Hub will be the more appropriate choice.  Event Hub will generally be the more appropriate choice if either the data will not be coming to Azure directly from the devices, but rather either cloud-to-cloud through another provider, intra-cloud, or if the data is already landing on-premise and needs to be streamed to the cloud from a small number of endpoints internally.  There are exceptions to both conditions, of course.

Both solutions offer very high throughput data ingestion and can handle tremendous streaming data volumes.  In fact, today, IoT Hub is primarily a set of additional services that wrap an underlying Event Hub.

Event Hub, however, has some limitations that may make it inappropriate for an IoT scenario when data is being pulled from devices in the field, specifically:

  • Event Hub is limited to 5000 concurrent AMQP connections.  At “IoT scale”, there would likely be a lot more devices than that
  • Event Hub does not authenticate individual device connections, but rather all the devices essentially share the same (or few) access keys.  You also cannot enable/disable communication from a single device.
  • Event Hub is ingestion only and has no facility for sending traffic back to the devices (command and control or device management scenarios)
  • Event Hub does not support MQTT.  For better or worse, MQTT is a very popular IoT protocol and is supported by competitors, particularly AWS

In addition to the features of Event Hub, IoT Hub also adds:

  • Scale out connections.  A single IoT Hub has been tested to more than a million concurrent connections
  • IoT Hub does individual device authentication (via either a device-specific key, or x.509 certificate) and you can enable/disable individual devices
  • Bi-directional communication, allowing you to either do asynchronous or synchronous commands to the device
  • Supports AMQP and HTTP, like Event Hub, but also adds support for MQTT as a transport protocol
  • Device Management – the ability to orchestrate firmware updates and desired and reported configuration management with devices
  • Device Twins – A queryable cloud-side representation of the state of a device, allowing customers to group and manage their devices “at IoT scale”
  • File update – orchestrate file uploads from device through the hub

As you can see, both Event Hub and IoT Hub are valuable Azure services with distinct use cases.  Event Hub is great for high throughput ingestion (only) of data from a limited number of sources (like cloud to cloud, on-prem to cloud, or intra-cloud).  In those scenarios, Event Hub will generally be appropriate and significantly cheaper than IoT Hub.  However, for IoT scenarios where interacting with the individual devices directly are involved, IoT Hub will generally be the better technology and adds a lot of features we need to be competitive.

More info

Comparison of Azure IoT Hub and Azure Event Hubs

Posted in Azure, Cortana Intelligence Suite, SQLServerPedia Syndication | 1 Comment

Microsoft Products vs Hadoop/OSS Products

Microsoft’s end goal is for Azure to become the best cloud platform for customers to run their data workloads.  This means Microsoft will provide customers the best environment to run their big data/Hadoop as well as a place where Microsoft can offer services with our unique point-of-view.  Specific decision points on using Hadoop is if the customer wants to use open source technologies or not.  Some of the benefits of running open source software (OSS) on Azure include:

  • Quick installs
  • Support
  • Easy scale
  • Products work together
  • Don’t need to get your own hardware

To determine the cost savings by moving your OSS to Azure, see the Total Cost of Ownership (TCO) Calculator.

Of course there are many benefits of using Microsoft products over OSS, such as ease of use, support, better security, easier to find people with skills, less frequent version updates, more stable (less bugs), more compatibility and integration between products, etc.  But there are still reasons to use OSS (i.e. cost, faster performance in some cases, more product selection and features), so I created a list that shows many of the Microsoft products and their equivalent, or close equivalent, Hadoop/OSS product.

I tried to list only Apache products unless there was no equivalent Apache product or there is a really popular Open Source Software (OSS) product.

Microsoft Product Hadoop/Open Source Software Product
Office365/Excel OpenOffice/Calc
DocumentDB MongoDB, MarkLogic, HBase, Cassandra
SQL Database SQLite, MySQL, PostgreSQL, MariaDB
Azure Data Lake Analytics/YARN None
Azure VM/IaaS OpenStack
Blob Storage HDFS, Ceph (Note: These are distributed file systems and Blob storage is not distributed)
Azure HBase Apache HBase (Azure HBase is a service wrapped around Apache HBase), Apache Trafodion
Event Hub Apache Kafka
Azure Stream Analytics Apache Storm, Apache Spark, Twitter Heron
Power BI Apache Zeppelin, Apache Jupyter, Airbnb Caravel, Kibana
HDInsight Hortonworks (pay), Cloudera (pay), MapR (pay)
Azure ML Apache Mahout, Apache Spark MLib
Microsoft R Open R
SQL Data Warehouse Apache Hive, Apache Drill, Presto
IoT Hub Apache NiFi
Azure Data Factory Apache Falcon, Airbnb Airflow
Azure Data Lake Storage/WebHDFS HDFS Ozone
Azure Analysis Services/SSAS Apache Kylin, AtScale (pay)
SQL Server Reporting Services None
Hadoop Indexes Jethro Data (pay)
Azure Data Catalog Apache Atlas
PolyBase Apache Drill
Azure Search Apache Solr, Apache ElasticSearch (Azure Search build on ES)
Others Apache Flink, Apache Ambari, Apache Ranger, Apache Knox

Many of the Hadoop/OSS products are available in Azure.  If you feel I’m missing some products from this list, please let me know as this is very subjective and comments are always welcome!

Posted in Hadoop, SQLServerPedia Syndication | Comments Off on Microsoft Products vs Hadoop/OSS Products

Create quick queries quickly with SSAS in Azure

In a perfect world a data warehouse would always return all queries very quickly, no matter what the size or complexity of the data warehouse.  But the reality is that sometimes we won’t get the performance we need from the data warehouse, and the solution is to use a SQL Server Analysis Services (SSAS) cube, which also has other benefits (see Why use a SSAS cube?).  Thanks to Azure, creating a cube can be done in a matter of hours, giving you quick results for just about any query no matter how big the data warehouse gets.

If you want to create a cube quickly you can do so by creating a VM in Azure and installing SQL Server and SSAS (or choosing a VM from the marketplace with SQL Server and SSAS already installed) and assuming the following restrictions:

  • No high-availability needed (can be accomplished in the future by creating another VM in an availability set and syncing cubes and using a Azure load balancer)
  • No disaster recovery needed (can be accomplished in the future by creating another VM in another region and syncing cubes).  Solution for now would be to backup the cube and restore if needed, or just rebuild and reprocess the cube
  • Number of concurrent connections under a hundred.  To handle more connections in the future can be accomplished by scaling up the VM and/or creating multiple VMs and syncing cubes and adding a load balancer (keeping in mind VMs can be shut-down off hours to save cost).  The number of concurrent connections is highly variable based on the type of querying and caching.  See SQL Server Best Practices Article
  • Using SSAS tabular model, and the size of the cube is less than 180GB.  Cube compression is between 10-100x and the 180GB number is with a conservative average of 10x and using 60% of the servers memory.  A larger cube can be accomplished via a SSAS multidimensional model as it does not require the cube in memory like Tabular.  The multidimensional SSAS cube can use HOLAP storage mode to support more source data, as the cube will not have a copy of the detail records from the source system, greatly reducing cube size and processing time.  Drilling down to the detail records can still be made by the cube which will accomplish this by querying the source system
  • Available maintenance window at night to process the tabular cube (in the future can add a “cube processing VM” to process cube and then sync the cubes to avoid queries hitting a cube when it is being processed and slowing the queries down)
  • Only need to update the cube once a day (in the future can use VM to process cube and sync as well as use partitioning)
  • Future expansion can be via Azure Analysis Services (in preview) as well as creating multiple cubes by subject areas and using a true scale-out architecture
  • It’s a balance between using extra developer hours to save cost and choosing more hardware up-front

To choose the appropriate size VM, see Hardware Sizing a Tabular Solution (SQL Server Analysis Services) and Sizes for virtual machines in Azure and Largest VM in the Cloud and How Much RAM Do I Need For My SSAS Tabular Server?

More info:

SQL Server 2008 R2 Analysis Services Operations Guide

Why a Semantic Layer Like Azure Analysis Services is Relevant (Part 1)

Posted in SQLServerPedia Syndication, SSAS | 6 Comments

Data gateway confusion

Microsoft has created data gateways as a way that cloud products such as Power BI and Azure Machine Learning can access on-prem data, often called a hybrid model.  This is most useful for companies that do not want to put their data in the cloud, or there is simply too much data to upload to the cloud.

The confusion is that there are a number of different gateways, and some of those have been renamed multiple times.  I’ll try to provide a quick summary to help you understand all the current options:

On-premises data gateway: Formally called the enterprise version.  Multiple users can share and reuse a gateway in this mode.  This gateway can be used by Power BI, PowerApps, Microsoft Flow or Azure Logic Apps.  For Power BI, this includes support for both scheduled refresh and DirectQuery.  To add a data source such as SQL Server that can be used by the gateway, check out Manage your data source – SQL Server.  To connect the gateway to your Power BI, you will sign in to Power BI after you install it (see On-premises data gateway in-depth).

Personal gateway: This is for Power BI only and can be used as an individual without any administrator configuration.  This can only be used for on-demand refresh and scheduled refresh (not DirectQuery).

To install either of these gateways, see On-premises data gateway.  The differences between the two:

gw

Data Management Gateway: This is the old solution for Power BI but still used for Azure Data Factory and Azure Machine Learning.  You can download it here.

Here is the differences between the Data Management Gateway and the On-premises data gateway:

  Data Management Gateway PBI On-Premises Data Gateway
Target Scenarios Use Data Management Gateway for building advance analytics solutions.  With Data Management Gateway you can securely and efficiently transfer data between on-prem and cloud and integrate on-prem transformation with cloud. Use the on-premises gateway to keep PBI dashboards and reports up-to-date with on-premises data sources.
Key Capabilities Batch data transfer to/from a set of supported sources and destinations

Execute T-SQL against on-prem SQL Server

DirectQuery to SQL Server

Live connection to Analysis Services

Scheduled refresh against set of supported data sources

Cloud services it works with Azure Data Factory

Azure Machine Learning

Power BI

PowerApps (Preview)

Azure Logic Apps (Preview)

Microsoft Flow (Preview)

Microsoft has another different gateway that is specific to the Azure Analysis Services (Preview) service, at least for now: On-premises data gateway.

Posted in SQLServerPedia Syndication | 2 Comments