SQL Server 2016 available June 1st!

Woo hoo!  Microsoft has announced that SQL Server 2016 will be generally available on June 1st.  On that date all four versions of SQL Server 2016 will be available to all users, including brand new ones, MSDN subscribers, and existing customers.

Here is a quick overview of the tons of new features, broken out by edition (click for larger view):


and here is another view on the features available for each edition (click for larger view):


In addition to the on-premises release, Microsoft will also have a virtual machine available on June 1st through its Azure cloud platform to make it real easy for companies to deploy SQL Server 2016 in the cloud.  So start planning today!

More info:

Get ready, SQL Server 2016 coming on June 1st

Microsoft SQL Server 2016 will be generally available June 1

Posted in SQL Server 2016, SQLServerPedia Syndication | Leave a comment

Big Data architectures

Over the last few years I have been involved in reviewing the architectures of various companies that are building or have built big data solutions.  When I say “big data”, I’m referring to the incorporation of semi-structured data such as sensor data, device data, web logs, and social media.  The architectures generally fall into four scenarios:

Enterprise data warehouse augmentation

This scenario uses an enterprise data warehouse (EDW) built on a RDBMS, but will extract data from the EDW and load it into a big data hub along with data from other sources that are deemed not cost-effective to move into the EDW (usually high-volume data or cold data).  Some data enrichment is usually done in the data hub.  This data hub can then be queried, but primary analytics remain with the EDW.  The data hub is usually build on Hadoop or NoSQL.  This can save costs since storage using Hadoop or NoSQL is much cheaper than an EDW.  Plus, this can speed up the development of reports since the data in Hadoop or NoSQL can be used right away instead of waiting for an IT person to write the ETL and create the schema’s to ingest the data into the EDW.  Another benefit is it can support data growth faster as it is easy to expand storage on a Hadoop/NoSQL solution instead of on a SAN with an EDW solution.  Finally, it can help by reducing the number of queries on the EDW.

This scenario is most common when a EDW has been in existence for a while and users are requesting data that the EDW cannot handle because of space, performance, and data loading times.

The challenges to this approach is you might not be able to use your existing tools to query the data hub, as well as the data in the hub being difficult to understand and join and may not be completely clean.

Big Data Architectures

Data hub plus EDW

The data hub is used as a data staging and extreme-scale data transformation platform, but long-term persistence and analytics is performed in the EDW.  Hadoop or NoSQL is used to refine the data in the data hub.  Once refined, the data is copied to the EDW and then deleted from the data hub.

This will lower the cost of data capture, provide scalable data refinement, and provide fast queries via the EDW.  It also offloads the data refinement from the EDW.

Big Data Architectures 2


A distributed data system is implemented for long-term, high-detail big data persistence in the data hub and analytics without employing a EDW.  Low level code is written or big data packages are added that integrate directly with the distributed data store for extreme-scale operations and analytics.

The distributed data hub is usually created with Hadoop, HBase, Cassandra, or MongoDB.  BI tools specifically integrated with or designed for distributed data access and manipulation are needed.  Data operations either use BI tools that provide NoSQL capability or low-level code is required (e.g., MapReduce or Pig script).

The disadvantages of this scenario are reports and queries can have longer latency, new reporting tools require training which could lead to lower adoption, and the difficulty of providing governance and structure on top of a non-RDBMS solution.

Big Data Architectures-3

Modern Data Warehouse

An evolution of the three previous scenarios that provides multiple options for the various technologies.  Data may be harmonized and analyzed in the data lake or moved out to a EDW when more quality and performance is needed, or when users simply want control.  ELT is usually used instead of ETL (see Difference between ETL and ELT).  The goal of this scenario is to support any future data needs no matter what the variety, volume, or velocity of the data.

Hub-and-spoke should be your ultimate goal.  See Why use a data lake? for more details on the various tools and technologies that can be used for the modern data warehouse.

Big Data Architectures4

More info:

Forrester’s Patterns of Big Data

Most Excellent Big Data Questions: Top Down or Bottom Up Use Cases?

Posted in Big Data, Data warehouse, SQLServerPedia Syndication | 1 Comment

Azure SQL Database pricing

Pricing Azure SQL database is difficult because various database service tier options such as database transaction units (DTU’s), max database size, disaster recovery options, and backup retention days are used to determine pricing instead of hardware (CPU/RAM/HD).  Generally I recommend starting out with a low service tier and scaling as your needs increase as it only takes a few minutes to scale with no downtime (see Change the service tier and performance level (pricing tier) of a SQL database).

DTU’s are explained at here.  To help, there is a Azure SQL Database DTU Calculator.  This calculator will help you determine the number of DTUs being used for your existing on-prem SQL Server database(s) as well as a recommendation of the minimum performance level and service tier that you need before you migrate to Azure SQL Database.  It does this by using performance monitor counters.

After you use a SQL Database for a while, you can use a pricing tier recommendation tool to determine the best service tier to switch to.  It does this by assessing historical resource usage for a SQL database.

Note this pricing is per database, so if you have many databases on each on-prem SQL server you will have to price each one.  But there are many “built-in” features of SQL database, such as high availability and disaster recovery, that you don’t have to build or manage in the cloud, thus saving you costs and administration time.

You pay for each database based on up-time.  You don’t pay for storage (but there is a database size limit for each tier).

Below is the pricing for all database service tiers, based in the East US region, for the single database model (pricing is different for elastic database pools which are an option to share resources).  The pricing comes from the Azure pricing calculator:

Basic, SQL Database, 5 DTU, 2GB DB, $5/month
Standard, S0, SQL Database, 10 DTU, 250GB database, $15/month
Standard, S1, SQL Database, 20 DTU, 250GB database, $30/month
Standard, S2, SQL Database, 50 DTU, 250GB database, $75/month
Standard, S3, SQL Database, 100 DTU, 250GB database, $150/month
Premium, P1, SQL Database, 125 DTU, 500GB database, $465/month
Premium, P2, SQL Database, 250 DTU, 500GB database, $930/month
Premium, P4, SQL Database, 500 DTU, 500GB database, $1,860/month
Premium, P6, SQL Database, 1000 DTU, 500GB database, $3,720/month
Premium, P11, SQL Database, 1750 DTU, 1TB database, $7,001/month


For more details on pricing, see SQL Database Pricing.  Data going into a SQL Database is free (“Inbound data transfers”).  Data going out of a SQL Database is free (“Outbound data transfers”) if within the same Azure data center (“region”), otherwise there is a small cost – see Data Transfers Pricing Details.

An option to get around the 1TB database size limit is to split the data into multiple databases and use elastic database queries.

More info:

Understanding Windows Azure Storage Billing – Bandwidth, Transactions, and Capacity

Understand your bill for Microsoft Azure

Posted in Azure SQL Database, SQLServerPedia Syndication | 1 Comment

Analytics Platform System (APS) AU5 released

The Analytics Platform System (APS), which is a renaming of the Parallel Data Warehouse (PDW), has just released an appliance update (AU5), which is sort of like a service pack, except that it includes many new features.  Below is what is new in this release:

The AU5 release offers customers greater Transact-SQL (T-SQL) compatibility to aid in migrations from SQL Server and other platforms as well as improved connectivity and integration with Hadoop.  The AU5 release also includes support for faster data loading scenarios through both first party Microsoft and 3rd party tools.  These features continue to provide greater alignment with SQL Server and bring significant value to customers.

APS hardware partners Dell, HPE and Quanta Cloud Technology will ship APS with AU5 starting this month.  Specific shipping dates will vary depending on the hardware partner’s factory process.

This update delivers several new capabilities and features, including:

PolyBase/Hadoop Enhancements

  • Support for Hortonworks (HDP 2.3) and Cloudera (CHD 5.5)
  • String predicate pushdown to Hadoop for improved performance
  • Support for National/Government cloud storage and public Azure data sets
  • Apache Parquet file format support

Data Loading

  • BCP support for additional data loading scenarios
    • Supports the bcp.exe command line interface for simple data import/export scenarios
    • .NET SqlBulkCopy class support for custom application integration
  • Support for SQL Server Native Client OLE DB and the Bulk Copy API unlocking access by many 3rd party ETL, reporting and analytic tools

T-SQL compatibility improvements to reduce migration friction from SQL SMP

  • sp_prepexec – A common dynamic query preprocessor model that allows customers to simplify migrations
  • SET NOCOUNT and SET PARSEONLY set statements used across a variety of customer’s scenarios
  • IS_MEMBER() and IS_ROLEMEMBER() in support of Windows Authentication
  • CREATE TABLE as HEAP option allows customers to explicitly define heap, in addition to clustered index, or clustered columnstore index tables which aligns DDL with the SQL Data Warehouse service

Based on early testing and feedback, there is a performance improvement of up to 30% for short running queries.  In addition to the above, Microsoft is also offering an early preview of Adaptive Query Processing which can automatically re-optimize query execution mid-flight.  Please note that Adaptive Query Processing is currently in beta.  Customers wishing to participate in the beta should contact their support representative.

More info:

Microsoft releases the latest update to Analytics Platform System (APS)

Posted in PDW/APS, SQLServerPedia Syndication | Leave a comment

Scaling Azure VM’s

There are so many benefits to the cloud, but one of the major features is the ease of use in scaling a virtual machine (VM).  A common scenario is when you are building an application that needs SQL Server.  Simply create a VM on the Azure portal that has SQL Server already installed (or choose an OS-only VM and install SQL Server on your own if you will be bringing a SQL Server license over).  When choosing the initial VM, choose a smaller VM size to save costs.  Then as your application goes live, scale the VM up a bit to handle more users.  Then watch to see the performance of SQL Server.  If you need more resources, scale the VM up again.  If you scale too much so the VM is being under utilized, just scale it back down.

All this scaling can be done in a few mouse clicks with the resizing taking just a few minutes (or even just a few seconds!).  Compare this to scaling on-prem: review hardware, order hardware, wait for delivery, rack and stack it, install OS, install SQL Server, then hope you did not order too much or too little hardware.  It can take weeks or months to get up and running!  Then think of the pain if you have to upgrade the hardware: repeat the same process above, then backup and restore the databases, the logins, sql agent jobs, etc, and restore them on the new server and repoint all the users to the new server.  Ugh!

Let me quickly cover the process of scaling a VM in Azure to show you how easy it is.  First you select your VM in the Azure portal and choose “Size” under Settings:


Under “Choose a size” will be a list of all the available VM sizes you can scale to.  Some VMs may not appear in the list if you are in a region that does not support them, so keep this in mind when choosing the region for your initial VM:


Some of the VMs in the “Choose a size” list will be “active”, meaning you can select them, and resizing requires just a VM reboot.  The VMs that are active depends on if the current VM size is in same family (see list below), or if the Azure hardware cluster that the current VM resides in supports the new VM size (which you are not able to tell ahead of time – click here for more info):


If you see VMs in the “Choose a size” list that are grayed out and not selectable, it means the VM is not in the same family and the hardware cluster does not support the new VM size.  No problem!  If you are using the Azure Resource Manager (ARM) deployment model you can still resize to any VM, you just need to first stop your VM.  Then go back to the “Choose a size” list and you will see all the VMs are now active and selectable.  Just remember to restart the VM when the scaling is complete.

Resizing a VM deployed using the Classic (ASM) deployment model is more difficult if the new size is not supported by the hardware cluster where the VM is currently deployed.  Unlike VMs deployed through the ARM deployment model it is not possible to resize the VM while the VM is in a stopped state.  So for VMs using the ASM deployment model you should delete the virtual machine but select the option to keep the attached storage (OS and data disks) and then create a new virtual machine in the new size and reattach the disks from the old virtual machine.  To simplify this process, there is a PowerShell script to aid in the delete and redeployment process.

So once you choose the VM to scale to, you will see:


and in a few minutes, or even seconds if the VM is stopped, you will see:


If you needed to stop your VM, the next step is to restart it.  If you did not need to stop it, you are ready to go!

More info:

Anatomy of a Microsoft Azure Virtual Machine

Posted in Azure, SQLServerPedia Syndication | 1 Comment

My latest presentations

I frequently present at user groups, and always try to create a brand new presentation to keep things interesting.  We all know technology changes so quickly there is no shortage of topics!  There is a list of all my presentations with slide decks and videos in some cases.  Here are the new presentations I created the past few months:

Implement SQL Server on an Azure VM

This presentation is for those of you who are interested in moving your on-prem SQL Server databases and servers to Azure virtual machines (VM’s) in the cloud so you can take advantage of all the benefits of being in the cloud.  This is commonly referred to as a “lift and shift” as part of an Infrastructure-as-a-service (IaaS) solution.  I will discuss the various Azure VM sizes and options, migration strategies, storage options, high availability (HA) and disaster recovery (DR) solutions, and best practices. (slides)

Relational databases vs Non-relational databases

There is a lot of confusion about the place and purpose of the many recent non-relational database solutions (“NoSQL databases”) compared to the relational database solutions that have been around for so many years.  In this presentation I will first clarify what exactly these database solutions are, how they compare to Hadoop, and discuss the best use cases for each.  I’ll discuss topics involving OLTP, scaling, data warehousing, polyglot persistence, and the CAP theorem.  We will even touch on a new type of database solution called NewSQL.  If you are building a new solution it is important to understand all your options so you take the right path to success. (slides)

Big Data: It’s all about the Use Cases

Big Data, IoT, data lake, unstructured data, Hadoop, cloud, and massively parallel processing (MPP) are all just fancy words unless you can find uses cases for all this technology. Join me as I talk about the many use cases I have seen, from streaming data to advanced analytics, broken down by industry. I’ll show you how all this technology fits together by discussing various architectures and the most common approaches to solving data problems and hopefully set off light bulbs in your head on how big data can help your organization make better business decisions. (slides) (video)

Cortana Analytics Suite

Cortana Analytics Suite is a fully managed big data and advanced analytics suite that transforms your data into intelligent action.  It is comprised of data storage, information management, machine learning, and business intelligence software in a single convenient monthly subscription.  This presentation will cover all the products involved, how they work together, and use cases. (slides)

Posted in Presentation, SQLServerPedia Syndication | Leave a comment

Data loading into Azure SQL Data Warehouse

Azure SQL Data Warehouse (SQL DW) is a new platform-as-a service (PaaS) that distributes workloads across multiple compute resources, called massively parallel processing (MPP).  Loading data into a MPP data warehouse requires a different approach, or mindset, than traditional methods of loading data into a SMP data warehouse.

To help you with understanding how best to load data into SQL DW, Microsoft has released an excellent white paper by Martin Lee, John Hoang, and Joe Sack.  It describes t the SQL DW architecture and explores several loading techniques to help you reach maximum data-loading throughput and identify the scenarios that best suit each of these techniques.

Check it out: Azure SQL Data Warehouse loading patterns and strategies

Posted in Azure SQL DW, Data warehouse, SQLServerPedia Syndication | Leave a comment

Microsoft Azure Government

I’m sure you are aware of Microsoft Azure, but are you aware there is special version of Azure for U.S. governments?

Microsoft Azure Government is a cloud computing service for federal, state, local and tribal U.S. governments.  It was generally available in December 2014 after a year in preview.  To see the Azure services available for the government, see the services available by region.

By default, Azure Government ensures that all data stays within the U.S. and within data centers and networks that are physically isolated from the rest of Microsoft’s cloud computing solution, operated by screened U.S. persons.  It’s in compliance with FedRAMP, a mandatory government-wide program that prescribes a standardized way to carry out security assessments for cloud services.  It also supports a wide range of other compliance standards, including Health Insurance Portability and Accountability Act (HIPAA), Department of Defense Enterprise Cloud Service Broker (ECSB), and the FBI Criminal Justice Information Services (CJIS), which is meant to keep safe fingerprint and background-check data that has to be shared with other agencies.

Microsoft also offers government versions of Office 365, which is hosted in a dedicated “cloud community” reserved only for government customers.  There is also a Microsoft Dynamics CRM Online Government.

Also just announced:
Two new physically isolated regions, which will become available later this year, are part of Azure Government and are meant to host Department of Defense (DoD) data.  These regions will meet the Pentagon’s Defense Information Systems Agency (DISA) Impact level 5 restrictions and are, according to Microsoft, “architected to meet stringent DoD security controls and compliance requirements.”

Level 5 data includes controlled unclassified information.  Classified information (up to ‘secret’) can only be stored on systems that fall under the level 6 classification.  To gain level 5 authorization, cloud providers have to ensure that all workloads run (and all data is stored) on dedicated hardware that is physically separated from non-DoD users.

In addition to its new work with the DoD, Microsoft is also expanding its support for FedRAMP, the standard that governs which cloud services federal agencies are able to use.  The company today announced that Azure Government has been selected to participate in a new pilot that will allow agencies to process high-impact data — that is, data that could have a negative impact on organizational operations, assets or individuals.  Until now, FedRAMP only authorized the use of moderate impact workloads.  Microsoft says it expects all the necessary papers for this higher authorization will be in place by the end of this month.

Azure Government is also on track to receive DISA Level 4 authorization soon.

More info:

Microsoft Cloud for Government

Posted in Azure, SQLServerPedia Syndication | Leave a comment

SQL Server on Linux!

Looks outside: pigs are flying!

In an announcement yesterday, SQL Server will be made available on Linux.  The private preview of SQL Server on Linux is available now, and Microsoft is targeting availability in mid-2017.  Microsoft will offer both on-premises and cloud versions of the product (via Linux VMs).  It will include the Stretch Database capabilities that Microsoft is building into SQL Server 2016.  Right now, SQL Server on Linux is available on Ubuntu or as a Docker image, and Microsoft intends to support Red Hat Enterprise Linux as well as other platforms over time.  The private preview is based on SQL Server 2016.

Considering how anti-Linux Microsoft was a few years ago, this is very surprising, but not so surprising if you have followed the changes over the past two years as Microsoft has come to embrace Linux and other open source technologies and tools (see Microsoft Loves Linux).

To find out more about SQL Server on Linux, you can sign up to get regular updates and provide input to the team, as well as apply to the private preview.

More info:

Microsoft is porting SQL Server to Linux

8 no-bull reasons why SQL Server on Linux is huge for Microsoft

Posted in SQL Server, SQLServerPedia Syndication | 1 Comment

Cross-database queries in Azure SQL Database

A limitation with Azure SQL database has been its inability to do cross-database SQL queries.  This has changed with the introduction of elastic database queries, now in preview.  However, it’s not as easy as on-prem SQL Server, where you can just use the three-part name syntax DatabaseName.SchemaName.TableName.  Instead, you have to define remote tables (tables outside your current database), which is similar to how PolyBase works for those of you familiar with PolyBase.

Here is sample code that, from within database AdventureWorksDB, selects data from table Customers in database Northwind:

--Within database AdventureWorksDB, will select data from table Customers in database Northwind

--Create database scoped master key and credentials


--Needs to be username and password to access SQL database

CREATE DATABASE SCOPED CREDENTIAL jscredential WITH IDENTITY = '<username>', SECRET = '<password>';

--Define external data source

           (TYPE = RDBMS,
            LOCATION = '<servername>.database.windows.net',
            DATABASE_NAME = 'Northwind',  
            CREDENTIAL = jscredential 

--Show created external data sources

select * from sys.external_data_sources; 

--Create external (remote) table.  The schema provided in your external table definition needs to match the schema of the tables in the remote database where the actual data is stored. 

CREATE EXTERNAL TABLE [NorthwindCustomers]( --what we want to call this table locally
	[CustomerID] [nchar](5) NOT NULL,
	[CompanyName] [nvarchar](40) NOT NULL,
	[ContactName] [nvarchar](30) NULL,
	[ContactTitle] [nvarchar](30) NULL,
	[Address] [nvarchar](60) NULL,
	[City] [nvarchar](15) NULL,
	[Region] [nvarchar](15) NULL,
	[PostalCode] [nvarchar](10) NULL,
	[Country] [nvarchar](15) NULL,
	[Phone] [nvarchar](24) NULL,
	[Fax] [nvarchar](24) NULL
  DATA_SOURCE = RemoteNorthwindDB,
  SCHEMA_NAME = 'dbo', --schema name of remote table
  OBJECT_NAME = 'Customers' --table name of remote table

--Show created external tables

select * from sys.external_tables; 

--You can now select data from this external/remote table, including joining it to local tables

select * from NorthwindCustomers


DROP EXTERNAL TABLE NorthwindCustomers;




More info:

Elastic database query for cross-database queries (vertical partitioning)

Posted in Azure SQL Database | 5 Comments