Azure SQL Data Warehouse is now GA

The Azure SQL Data Warehouse (SQL DW), that I blogged about here, is now generally available.  Here is the official announcement.

In brief, SQL DW is a fully managed data-warehouse-as-a-service that you can provision in minutes and scale up in seconds.  With SQL DW, storage and compute scale independently.  You can dynamically deploy, grow, shrink, and even pause compute, allowing for cost savings.  Also, SQL DW uses the power and familiarity of T-SQL so you can integrate query results across relational data in your data warehouse and non-relational data in Azure blob storage or Hadoop using PolyBase.  SQL DW offers an availability SLA of 99.9%, the only public cloud data warehouse service that offers an availability SLA to customers.

SQL DW uses an elastic massively parallel processing (MPP) architecture built on top of the SQL Server 2016 database engine.  It allows you to interactively query and analyze data using existing SQL-based tools and business intelligence applications.  It uses column stores for high performance analytics and storage compression, a rich collection of aggregation capabilities of SQL Server, and state of the art query optimization capabilities.

Two customer case studies using SQL DW in production were just published: AGOOP and P:Cubed.

Also note that until recently, you had to use SSDT to connect to SQL DW.  But with the July 2016 update of SSMS, you can now connect to SQL DW using SSMS (see Finally, SSMS will talk to Azure SQL DW).

More info:

Azure SQL Data Warehouse Hits General Availability

Introduction to Azure SQL Data Warehouse (video)

Microsoft Azure SQL Data Warehouse Overview (video)

Azure SQL Data Warehouse Overview with Jason Strate (video)

A Developers Guide to Azure SQL Data Warehouse (video)

Posted in Azure SQL DW, SQLServerPedia Syndication | 2 Comments

Virtualization does not equal a private cloud

I see a lot of confusion when it comes to creating a private cloud.  Many seem to think that by virtualizing your servers in your local on-premise data center, you have created a private cloud.  But by itself, virtualization falls far short of the characteristics of a private cloud and is really just server consolidation and data center automation.  The original five cloud characteristics, as stated by the National Institute of Standards and Technology (NIST), are what defines a cloud:

  1. On-demand self-service: A user can provision a virtual machine (or other resources such as storage or networks) as needed automatically without intervention from IT
  2. Broad network access: Virtualization is available for all internal customers that are on the network through various client platforms (e.g., mobile phones, tablets, laptops, and workstations)
  3. Resource pooling (multitenancy): Computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand.  There is a sense of location independence in that the customer generally has no control or knowledge over the exact location of the provided resources (storage, processing, memory, and network bandwidth) but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter)
  4. Rapid elasticity (hyper-scaling): Resources can be scaled up or down manually or in some cases automatically to commensurate with demand.  To the consumer, the capabilities available for provisioning often appear to be unlimited and can be appropriated in any quantity at any time.  There is no need for IT to provision VMs individually and install the OS and software
  5. Measured service (operations): As opposed to IT charging costs back to other departments based on traditional budgeting, costs are based on actual usage.  Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts).  Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service

Since virtualization only solves #3, a lot more should be done to create a private cloud.  Also, a cloud should also support Platform-as-a-service (PaaS) to allow for application innovation.  Fortunately there are products to add the other characteristics to give you a private cloud, such as Microsoft’s Azure Stack.  And of course you can always use a public cloud.

The main difference between a private cloud and a public cloud is in scaling.  A public cloud like Azure has no “ceiling” so resources can be added with no limits (i.e. hyper-scaling).  A private cloud has a ceiling and you may have to wait for more hardware to be installed before you can scale.  Comparing a public cloud vs a private cloud vs a hybrid approach will be the topic of another blog.

More info:

Private cloud vs. virtualization

Which Types of Workloads to Run in Public, Private Clouds

Posted in Azure, SQLServerPedia Syndication | 1 Comment

Azure SQL Data Warehouse pricing

The pricing for Azure SQL Data Warehouse (SQL DW) consists of a compute charge and a storage charge.  For compute, it is not based on hardware configuration but rather by data warehouse units (DWU).  DWU can be scaled up or down via a sliding bar in just a couple of minutes with no down time.  You pay for DWU blocks, based on up time (you can pause your SQL DW database and not have to pay for compute while paused).  When paused, storage is still available and can be used by other resources.

You must pay for storage, even when paused, but there is no limit to the amount of data you can put into storage.

Below is some examples of the compute pricing based in the East US region.  The pricing comes from the Azure pricing calculator:

SQL Data Warehouse, 100 DWU, $1,125/month
SQL Data Warehouse, 500 DWU, $5,625/month
SQL Data Warehouse, 1000 DWU, $11,250/month
SQL Data Warehouse, 1500 DWU, $16,875/month
SQL Data Warehouse, 2000 DWU, $22,500/month
SQL Data Warehouse, 3000 DWU, $33,750/month
SQL Data Warehouse, 6000 DWU, $67,500/month

Data warehouses will use Premium Disk storage (P30).  Below is some examples of the storage pricing based in the East US region:

SQL Data Warehouse, RA-GRS Page Blob, 1TB = $135/month
SQL Data Warehouse, RA-GRS Page Blob, 10TB = $1,351/month
SQL Data Warehouse, RA-GRS Page Blob, 100TB = $13,517/month
SQL Data Warehouse, RA-GRS Page Blob, 1PB = $135,170/month

Storage transactions are not billed; customers only pay for data stored, not storage transactions.  Inbound data transfers are free.  Outbound data transfers are charged at regular data transfer rates.  Note you could store lesser-accessed data in Azure Blob Storage and access it via PolyBase to save storage costs.

More info:

SQL Data Warehouse Pricing

Understanding Windows Azure Storage Billing – Bandwidth, Transactions, and Capacity

Understand your bill for Microsoft Azure

Posted in Azure SQL DW, SQLServerPedia Syndication | 1 Comment

SQL Server 2016 is here!

Today is the day: SQL Server 2016 is available for download!  You can download all the versions (enterprise, standard, web, express with advanced services, express, developer) of SQL Server 2016 now if you have a MSDN subscription, and you can also create an Azure VM right now that includes SQL Server pre-installed with one of the versions (enterprise, standard, web, express).  Lastly, you can also experience the full features through the free evaluation edition (180 days) or the developer edition (you have to sign in to Visual Studio Dev Essentials, a free developer program, before you can download the developer edition).

Here is a quick overview of the tons of new features, broken out by edition (click for larger view):


and here is another view on the features available for each edition (click for larger view):


More info:

SQL Server 2016 is generally available today

SQL Server 2016 e-book

Posted in SQL Server 2016, SQLServerPedia Syndication | 1 Comment

Azure Storage pricing

Azure storage is a great, inexpensive solution for storing your data in the cloud.  There are many types of Azure storage options, but I wanted to list here the pricing of the most common type: block blob.

The pricing below is based on the East US region (most other regions have very similar pricing).  The pricing comes from the Azure pricing calculator.  The data redundancy options LRS, ZRS, GRS, and RA-GRS are explained at Redundancy Options in Azure Blob Storage, but here is a review:

Makes multiple synchronous copies of your data within a single datacenter Stores three copies of data across multiple datacenters within or across regions. For block blobs only. Same as LRS, plus multiple asynchronous copies to a second datacenter hundreds of miles away Same as GRS, plus read access to the secondary datacenter

Here is the pricing:

Storage, LRS, 1GB = $.02/month
Storage, LRS, 10GB = $.24/month
Storage, LRS, 100GB = $2.40/month
Storage, LRS, 1TB = $24/month
Storage, LRS, 10TB = $242/month
Storage, LRS, 100TB = $2,396/month
Storage, LRS, 1PB = $23,572/month

Storage, ZRS, 1GB = $.03/month
Storage, ZRS, 10GB = $.30/month
Storage, ZRS, 100GB = $3.00/month
Storage, ZRS, 1TB = $30/month
Storage, ZRS, 10TB = $302/month
Storage, ZRS, 100TB = $2,995/month
Storage, ZRS, 1PB = $29,466/month

Storage, GRS, 1GB = $.05/month
Storage, GRS, 10GB = $.48/month
Storage, GRS, 100GB = $4.80/month
Storage, GRS, 1TB = $49/month
Storage, GRS, 10TB = $484/month
Storage, GRS, 100TB = $4,793/month
Storage, GRS, 1PB = $47,145/month

Storage, RA-GRS, 1GB = $.06/month
Storage, RA-GRS, 10GB = $.61/month
Storage, RA-GRS, 100GB = $6.10/month
Storage, RA-GRS, 1TB = $62/month
Storage, RA-GRS, 10TB = $614/month
Storage, RA-GRS, 100TB = $6,083/month
Storage, RA-GRS, 1PB = $59,852/month

The other cost to note is you are charged for the number of storage transactions, but this is extremely cheap (transactions include both read and write operations to storage):

100k, .01/month
1M, .04/month
10M, .36/month
100M, $3.60/month

Each and every REST call to Windows Azure Blobs counts as one transaction (see Understanding Windows Azure Storage Billing – Bandwidth, Transactions, and Capacity).

Data going into an Azure storage is free (“Inbound data transfers”).  Data going out of an Azure storage is free (“Outbound data transfers”) if within the same Azure data center (“region”), otherwise there is a small cost (see Data Transfers Pricing Details):

First 5 GB /Month Free Free Free
5 GB – 10 TB  /Month $0.087 per GB $0.138 per GB $0.181 per GB
Next 40 TB
(10 – 50 TB) /Month
$0.083 per GB $0.135 per GB $0.175 per GB
Next 100 TB
(50 – 150 TB) /Month
$0.07 per GB $0.13 per GB $0.17 per GB
Next 350 TB
(150 – 500 TB) /Month
$0.05 per GB $0.12 per GB $0.16 per GB
  • Zone 1: US West, US East, US North Central, US South Central, US East 2, US Central, Europe West, Europe North
  • Zone 2: Asia Pacific East, Asia Pacific Southeast, Japan East, Japan West, Australia East, Australia Southeast
  • Zone 3: Brazil South

So in summary, your total cost depends on how much you store, the volume and type of storage transactions and outbound data transfers, and which data redundancy option you choose.

Note that there is a new feature for Azure blob storage: hot access tier and cold access tier (see New Azure storage account type).  The hot access tier pricing is as listed above.  The cold access tier has greatly reduced pricing (ZRS is not yet supported).  For example:

Storage, LRS, 100GB = $2.40/month (hot), $1.00/month (cold)
Storage, GRS, 100GB = $4.80/month (hot), $2.00/month (cold)
Storage, RA-GRS, 100GB = $6.10/month (hot), $2.50/month (cold)

Finally, there is a Azure Speed Testing site that you can use to determine the closest data center to your location to provide the fastest network latency.

More info:

Azure Storage Pricing

Windows Azure Storage Transaction | Unveiling the Unforeseen Cost and Tips to Cost Effective Usage

Understand your bill for Microsoft Azure

Posted in Azure, SQLServerPedia Syndication | 1 Comment

Migrate from on-prem SQL server to Azure VM IaaS

Hopefully you went through my presentations Should I move my database to the cloud?Introducing Azure SQL Database and Implement SQL Server on an Azure VM and are convinced to move your databases to the cloud.  If you are going to migrate to Azure SQL Database and want more info on how to copy your SQL Server databases from on-prem to Azure SQL database (PaaS), check out Migrate from on-prem SQL server to Azure SQL Database.  If you want to copy data from on-prem to SQL Server in an Azure VM (IaaS), check out the number of migration methods below:

  • Use the Deploy a SQL Server Database to a Microsoft Azure VM wizard. Recommended method for migrating an on-premises user database when the compressed database backup file is less than 1 TB.  Fastest and simplest method, use whenever possible to migrate to a new or existing SQL Server instance in an Azure virtual machine.  Use on SQL Server 2005 or greater to SQL Server 2014 or greater.  But note this method is for only the classic deployment model
  • Perform on-premises backup using compression and manually copy the backup file into the Azure virtual machine and then do a restore (only if you cannot use the above wizard or the database backup size is larger than 1 TB).  Use on SQL Server 2005 or greater to SQL Server 2005 or greater
  • Perform a backup to URL and restore into the Azure virtual machine from the URL.  Use on SQL Server 2012 SP1 CU2 or greater to SQL Server 2012 SP1 CU2 or greater
  • Detach and then copy the data and log files to Azure blob storage and then attach to SQL Server in Azure VM from URL.  Use on SQL Server 2005 or greater to SQL Server 2014 or greater
  • Convert on-premises physical machine to Hyper-V VHD, upload to Azure Blob storage, and then deploy as new VM using uploaded VHD.  Use when bringing your own SQL Server license, when migrating a database that you will run on an older version of SQL Server, or when migrating system and user databases together as part of the migration of database dependent on other user databases and/or system databases.  Use on SQL Server 2005 or greater to SQL Server 2005 or greater
  • Ship hard drive using Windows Import/Export Service.  Use when manual copy method is too slow, such as with very large databases.  Use on SQL Server 2005 or greater to SQL Server 2005 or greater
  • If you have an AlwaysOn deployment on-premises and want to minimize downtime, use the Add Azure Replica Wizard to create a replica in Azure and then failover, pointing users to the Azure database instance.  Use on SQL Server 2012 or greater to SQL Server 2012 or greater
  • If you do not have an AlwaysOn deployment on-premises and want to minimize downtime, use SQL Server transactional replication to configure the Azure SQL Server instance as a subscriber and then disable replication, pointing users to the Azure database instance.  Use on SQL Server 2005 or greater to SQL Server 2005 or greater
  • Others: Data-tier Application, T-SQL scripts, SQL Server Import and Export Wizard, SSIS, Copy Database Wizard

More info:

Migrate a SQL Server database to SQL Server in an Azure VM

SQL Server 2014 Hybrid Cloud Scenarios: Migrating On-Premises SQL Server to Windows Azure Virtual Machines

How To Move or Migrate SQL Server Workload to Azure SQL Database Cloud Services or Azure VM – All version of SQL Server – Step-By-Step

Free ebook: Microsoft Azure Essentials Migrating SQL Server Databases to Azure

Posted in Azure, SQL Server, SQLServerPedia Syndication | 1 Comment

Azure Speed Testing

Have you ever wondered what Azure region would be the fastest to host your applications when those applications are accessed from your office?  Obviously the closest data center would probably be the best choice (each region has multiple data centers), but how do you know what data center is the closest?  And how can you easily confirm the closest data center is the fastest?  Well, I recently discovered a site that can help you with that.  It’s called AzureSpeed, and it has a number of tests to help you:

  • Azure Storage Latency Test – Test network latency to Azure Storage in worldwide data centers from your location to determine the best region for your application and users
  • Azure Storage Blob Upload Speed Test – Test upload speeds to Azure Blob Storage in worldwide data centers.  Small pre-determined file is uploaded
  • Azure Storage Large File Upload Speed Test – Test uploading of a large file to Azure Blob Storage in worldwide data centers, with different upload options (block size, thread).  You choose a file on your computer to upload
  • Azure Storage Blob Download Speed Test – Test downloading of a 100MB file from any data center

There is also a Cloud Region Finder that enables you to quickly lookup the cloud and region information of an application deployment by entering the URL.  For example, enter “” and it will return “Azure – North Europe”.

Posted in Azure, SQLServerPedia Syndication | Leave a comment

New Azure storage account type

In Azure, there previously has been only one kind of storage account, which is now called a “General Purpose” storage account.  Just introduced is a new kind of storage account, called “Cool Blob Storage”.  When creating a storage account, the “Account kind” will now list two options: the existing type “General purpose” as well as the new type “Blob storage”.

“General Purpose” storage accounts provide storage for blobs, files, tables, queues and Azure virtual machine disks under a single account.  “Cool Blob Storage” storage accounts are specialized for storing blob data and support choosing an access tier, which allows you to specify how frequently data in the account is accessed.  Choose an access tier that matches your storage needs and optimizes costs.

The access tiers available for blob storage accounts are “hot” and “cold”.  In general, hot data is classified as data that is accessed very frequently and needs to be highly durable and available.  On the other hand, cool data is data that is infrequently accessed and long-lived.  Cool data can tolerate a slightly lower availability, but still requires high durability and similar time to access and throughput characteristics as hot data.  For cool data, slightly lower availability SLA and higher access costs are acceptable tradeoffs for much lower storage costs.  Azure Blob storage now addresses this need for differentiated storage tiers for data with different access patterns and pricing model.  So you can now choose between Cool and Hot access tiers to store your less frequently accessed cool data at a lower storage cost, and store more frequently accessed hot data at a lower access cost.  The Access Tier attribute of hot or cold is set at an account level and applies to all objects in that account.  So if you want to have both a hot access tier and a cold access tier, you will need two accounts.  If there is a change in the usage pattern of your data, you can also switch between these access tiers at any time.

Hot access tier
Cool access tier
(RA-GRS reads)
Usage charges
Higher storage costs
Lower access and transaction costs
Lower storage costs
Higher access and transaction costs
Minimum object size
Minimum storage duration
(Time to first byte)
Performance and Scalability
Same as general purpose storage accounts

Some common scenarios with the Cool tier include: backup data, disaster recovery, media content, social media photos and videos, scientific data, and so on.


Going forward, Blob storage accounts are the recommended way for storing blobs, as future capabilities such as hierarchical storage and tiering will be introduced based on this account type.  Note that Blob storage accounts do not yet support page blobs or Zone-redundant storage (ZRS).

More info:

Introducing Azure Cool Blob Storage

Azure Blob Storage: Cool and Hot tiers

About Azure storage accounts

Microsoft launches ‘cool blob’ Azure storage – at 1¢ per GB monthly

Microsoft Unveils Cheaper Azure Cool Blob Storage Option

Posted in Azure, SQLServerPedia Syndication | 4 Comments

SQL Server 2016 available June 1st!

Woo hoo!  Microsoft has announced that SQL Server 2016 will be generally available on June 1st.  On that date all four versions of SQL Server 2016 will be available to all users, including brand new ones, MSDN subscribers, and existing customers.

Here is a quick overview of the tons of new features, broken out by edition (click for larger view):


and here is another view on the features available for each edition (click for larger view):


In addition to the on-premises release, Microsoft will also have a virtual machine available on June 1st through its Azure cloud platform to make it real easy for companies to deploy SQL Server 2016 in the cloud.  So start planning today!

More info:

Get ready, SQL Server 2016 coming on June 1st

Microsoft SQL Server 2016 will be generally available June 1

Posted in SQL Server 2016, SQLServerPedia Syndication | Leave a comment

Big Data architectures

Over the last few years I have been involved in reviewing the architectures of various companies that are building or have built big data solutions.  When I say “big data”, I’m referring to the incorporation of semi-structured data such as sensor data, device data, web logs, and social media.  The architectures generally fall into four scenarios:

Enterprise data warehouse augmentation

This scenario uses an enterprise data warehouse (EDW) built on a RDBMS, but will extract data from the EDW and load it into a big data hub along with data from other sources that are deemed not cost-effective to move into the EDW (usually high-volume data or cold data).  Some data enrichment is usually done in the data hub.  This data hub can then be queried, but primary analytics remain with the EDW.  The data hub is usually build on Hadoop or NoSQL.  This can save costs since storage using Hadoop or NoSQL is much cheaper than an EDW.  Plus, this can speed up the development of reports since the data in Hadoop or NoSQL can be used right away instead of waiting for an IT person to write the ETL and create the schema’s to ingest the data into the EDW.  Another benefit is it can support data growth faster as it is easy to expand storage on a Hadoop/NoSQL solution instead of on a SAN with an EDW solution.  Finally, it can help by reducing the number of queries on the EDW.

This scenario is most common when a EDW has been in existence for a while and users are requesting data that the EDW cannot handle because of space, performance, and data loading times.

The challenges to this approach is you might not be able to use your existing tools to query the data hub, as well as the data in the hub being difficult to understand and join and may not be completely clean.

Big Data Architectures

Data hub plus EDW

The data hub is used as a data staging and extreme-scale data transformation platform, but long-term persistence and analytics is performed in the EDW.  Hadoop or NoSQL is used to refine the data in the data hub.  Once refined, the data is copied to the EDW and then deleted from the data hub.

This will lower the cost of data capture, provide scalable data refinement, and provide fast queries via the EDW.  It also offloads the data refinement from the EDW.

Big Data Architectures 2


A distributed data system is implemented for long-term, high-detail big data persistence in the data hub and analytics without employing a EDW.  Low level code is written or big data packages are added that integrate directly with the distributed data store for extreme-scale operations and analytics.

The distributed data hub is usually created with Hadoop, HBase, Cassandra, or MongoDB.  BI tools specifically integrated with or designed for distributed data access and manipulation are needed.  Data operations either use BI tools that provide NoSQL capability or low-level code is required (e.g., MapReduce or Pig script).

The disadvantages of this scenario are reports and queries can have longer latency, new reporting tools require training which could lead to lower adoption, and the difficulty of providing governance and structure on top of a non-RDBMS solution.

Big Data Architectures-3

Modern Data Warehouse

An evolution of the three previous scenarios that provides multiple options for the various technologies.  Data may be harmonized and analyzed in the data lake or moved out to a EDW when more quality and performance is needed, or when users simply want control.  ELT is usually used instead of ETL (see Difference between ETL and ELT).  The goal of this scenario is to support any future data needs no matter what the variety, volume, or velocity of the data.

Hub-and-spoke should be your ultimate goal.  See Why use a data lake? for more details on the various tools and technologies that can be used for the modern data warehouse.

Big Data Architectures4

More info:

Forrester’s Patterns of Big Data

Most Excellent Big Data Questions: Top Down or Bottom Up Use Cases?

Posted in Big Data, Data warehouse, SQLServerPedia Syndication | 4 Comments