SQL Server 2016 real-time operational analytics

SQL Server 2016 introduces a very cool new feature called real-time operational analytics, which is the ability to run both analytics (OLAP) and OLTP workloads on the same database tables at the same time.  This allows you to eliminate the need for ETL and a data warehouse in some cases (using one system for OLAP and OLTP instead of creating two separate systems).  This will help to reduce complexity, cost, and data latency.

Real-time operational analytics targets the scenario of a single data source such as an enterprise resource planning (ERP) application on which you can run both the operational and the analytics workload.  This does not replace the need for a separate data warehouse when you need to integrate data from multiple sources before running the analytics workload or when you require extreme analytics performance using pre-aggregated data such as cubes.

Real-time operational analytics uses an updatable nonclustered columnstore index (NCCI).  The columnstore index maintains a copy of the data, so the OLTP and OLAP workloads run against separate copies of the data.  This minimizes the performance impact of both workloads running at the same time.  SQL Server automatically maintains index changes so that OLTP changes are always up-to-date for analytics.  This makes it possible and practical to run analytics in real-time on up-to-date data. This works for both disk-based and memory-optimized tables.

To accomplish this, all you need to do is to create an NCCI on one or more tables that are needed for analytics.  SQL Server query optimizer automatically chooses NCCI for analytics queries while your OLTP workload continues to run using the same btree indexes as before.

ncci-basic

The analytics query performance with real-time operational analytics will not be as fast as you can get with a dedicated data warehouse but the key benefit is the ability to do analytics in real-time.  Some businesses may choose to do real-time operational analytics while still maintaining a dedicated data warehouse for extreme analytics as well as incorporating data from other sources.

More info:

Get started with Columnstore for real time operational analytics

Real-Time Operational Analytics: DML operations and nonclustered columnstore index (NCCI) in SQL Server 2016

Real-Time Operational Analytics – Overview nonclustered columnstore index (NCCI)

Real-Time Operational Analytics Using In-Memory Technology

SQL Server 2016 Operational Analytics (video)

Real Time Operational Analytics in SQL Server 2016 (video)

Posted in SQL Server 2016, SQLServerPedia Syndication | 1 Comment

Azure SQL Database new performance level

A new performance level for Azure SQL Database was recently announced, called P15.  This new offering is more than two times more powerful than the next best offering, P11.  P15 offers 4000 database transaction units (DTU) where P11 offered 1750 DTU’s.  Also, the max concurrent workers and concurrent logins increased from 2,400 to 6,400; the max concurrent sessions stayed the same at 32,000; and the max In-memory OLTP storage increased from 14GB to 32GB.

More info:

Azure SQL Database new premium performance level P15 generally available

Posted in Azure SQL Database, SQLServerPedia Syndication | Leave a comment

Azure SQL Database vs SQL Data Warehouse

I am sometimes asked to compare Azure SQL Database (SQL DB) to Azure SQL Data Warehouse (SQL DW).  The most important thing to remember is SQL DB is for OLTP (i.e. applications with individual updates, inserts, and deletes) and SQL DW is not as it’s strictly for OLAP (i.e. data warehouses).  So if your going to build a OLTP solution, you would choose SQL DB.  However, both products can be used for building a data warehouse (OLAP).  With that in mind, here is a list of the differences:

I have other blogs that cover SQL DB and SQL DW.

Posted in Azure SQL Database, Azure SQL DW, SQLServerPedia Syndication | 1 Comment

What is the Lambda Architecture?

Lambda architecture is a data-processing architecture designed to handle massive quantities of data (i.e. “Big Data”) by using both batch-processing and stream-processing methods.  This idea is to balance latency, throughput, scaling, and fault-tolerance by using batch processing to provide comprehensive and accurate views of batch data, while simultaneously using real-time stream processing to provide views of online data.  The two view outputs may be joined before presentation.

This allows for a way to bridge the gap between the historical single version of the truth and the highly sought after “I want it now” real-time solution.  By combining traditional batch processing systems with stream consumption tools the needs of both can be achieved with one solution.

The high-level overview of the Lambda architecture is expressed here:

Untitled picture

A brief explanation of each layer:

Data Consumption: This is where you will import the data from all the various source systems, some of which may be streaming the data.  Others may only provide data once a day.

Stream Layer: It provides for incremental updating, making it the more complex layer.  It trades accuracy for low latency, looking at only recent data.  Data in here may be only seconds behind, but the trade-off is the data may not be clean.

Batch Layer: It looks at all the data at once and eventually corrects the data in the stream layer.  It is the single version of the truth, the trusted layer, where there is usually lots of ETL and a traditional data warehouse.  This layer is built using a predefined schedule, usually once or twice a day, including importing the data currently stored in the stream layer.

Presentation Layer: Think of it as the mediator, as it accepts queries and decides when to use the batch layer and when to use the speed layer.  Its preference would be the batch layer as that has the trusted data, but if you ask it for up-to-the-second data, it will pull from the stream layer.  So it’s a balance of retrieving what we trust versus what we want right now.

A lambda architecture solution using Azure tools might look like this, using a vehicle with IoT sensors as an example:

lambda

In the above diagram, Event Hubs is used to ingest millions of events in real-time.  Stream Analytics is used for 1) real-time aggregations on data and 2) spool data into long-term storage (SQL Data Warehouse) for batch.  Machine Learning is used in real-time for anomaly detection on tire pressure, oil level, engine temp, etc, to predict vehicles requiring maintenance.  The data in the Azure Data Lake Storage is used for rich analytics using HDInsight and Machine Learning, orchestrated by the Azure Data Factory (for e.g. aggressive driving analysis over past year).  Power BI and Cortana are used for the presentation layer, and the Azure Data Catalog is the metadata repository for all the data sets.

Using Hadoop technologies might provide a solution that looks like this:

hadoop_summit_2015_takeaway_the_lambda_architecture-picture_1

Be aware this is a complicated architecture.  It will need a number of hardware resources and difference code bases for each layer, with each possibly using different technologies/tools.  The complexity of the code can be 3-4 times a traditional data warehouse architecture.  So you will have to weigh the costs versus the benefit of being able to use data slightly newer than a standard data warehouse solution.

More info:

The Lambda architecture: principles for architecting realtime Big Data systems

How to beat the CAP theorem

Questioning the Lambda Architecture

Lambda Architecture: Low Latency Data in a Batch Processing World

Lambda Architecture for the DWH

Lambda Architecture: Design Simpler, Resilient, Maintainable and Scalable Big Data Solutions

The Lambda Architecture and Big Data Quality

Posted in SQLServerPedia Syndication | 2 Comments

Multi-tenant databases in the cloud

For companies that sell an on-prem software solution and are looking to move that solution to the cloud, a challenge arises on how to architect that solution in the cloud.  For example, say you have a software solution that stores patient data for hospitals.  You sign up hospitals, install the hardware and software and the associated databases on-prem (at the hospital or a co-location facility), and load their patient data.  Think of each hospital as a “tenant”.  Now you want to move this solution to the cloud and get the many benefits that come with it, the biggest being the time to get a hospital up and running, which can go from months on-prem to hours in the cloud.  Now you have some choices: keep each hospital separate with their own VMs and databases (“single tenant”), or combining the data for each hospital into one database (“multi-tenant”).  For another example, you can simply be creating a PaaS application similar to Salesforce.  Here I’ll describe the various cloud strategies using Azure SQL Database, which is for OLTP applications, and Azure SQL Data Warehouse, which is for OLAP applications (see Azure SQL Database vs SQL Data Warehouse):

Separate Servers\VMs

You create VMs for each tenant, essentially doing a “lift and shift” of the current on-premise solution.  This provides the best isolation possible and it’s regularly done on-premises, but it’s also the one that doesn’t enable cutting costs, since each tenant has it’s own server, sql, license and so on.  Sometimes this is the only allowable option if you have in your client contract that their data will be virtual machine-isolated from other clients.  Some cons: table updates must be replicated across all the servers (i.e. updating reference tables), there is no resource sharing, and you need multiple backup strategies across all the servers.

Separate Databases

A new database is created and assigned when a tenant is provisioned.  You can land a number of the databases on each VM (i.e. each VM handles ten tenants), or create a database using Azure SQL Database.  This is often used in order if  you need to provide isolation for each customer, because we can associate different logins, permissions and so on to each database.  If using Azure SQL Database, be aware the database size limit is 1TB.  If you have a client database that will exceed that, you can use sharding (via Elastic Database Tools) or use cross-database queries (see Scaling Azure SQL Database and Cross-database queries in Azure SQL Database) with row-level security (see Multi-tenant applications with elastic database tools and row-level security).  The lower service tier for SQL Database has a max database size of 2GB, so you might be paying for storage that you don’t really use.  If using Azure SQL Data Warehouse, you have no limit on database size.  Some other cons: A different connection pool is required per database, updates must be replicated across all the databases, there is no resource sharing (unless using Elastic Database Pools) and you need multiple backup strategies across all the databases.

Separate Schemas

Also a very good way to achieve multi-tenancy but at the same time share some resources since everything is inside the same database, but the schemas used are different, having a separate schema for each tenant.  That allows you to even customize a specific tenant without affecting others.  And you save costs by only paying for one database (which can fit on SQL Data Warehouse not matter what the size) or a handful of databases if using SQL Database (i.e. ten tenants per database).  Some of the cons: You need to replicate all the database objects in every schema, so the number of objects can increase indefinitely, updates must be replicated across all the schemas, the connection pool for the database must maintain a different connection per tenant (or set of credentials), a different user is required per tenant (which is stored at server level) and you have to backup that user independently.

A variation of this using SQL Database is to split the tenants over multiple databases, but not to use separate schemas for performance reasons.  The is done by assigning a distinct set of tenants to each database using a partitioning strategy such as hash, range or list partitioning.  This data distribution strategy is oftentimes referred to as sharding.

Row Isolation

Everything is shared in this option, server, database and even schema.  All the data for the tenants are within the same tables in one database.  The only way they are differentiated is based on a TenantId or some other column that exists on the table level.  Another big benefit is code changes: with this option you only have one spot to change code (i.e. table structure).  With the other options you will have to roll out code changes to many spots.  You will need to use row-level security or something similar when you need to limit the results to an individual tenant.  Or you can create views or use stored procedures to filter tenants.  You also have the benefit of ease-of-use and performance when you need to aggregate results over multiple tenants.  Azure SQL Data Warehouse is a great solution for this, as there is no limit to the database size.

But be aware that there is a limit of 32 concurrent queries and 1,024 concurrent connections, so if you have thousands of users who will be hitting the database at the same time, you may want to create data marts in Azure SQL Database or create SSAS cubes.  This was a limit imposed since there is no resource governor or CPU query scheduler like there is in SQL Server.  But the benefit is each query gets its own resources and it won’t affect other queries (i.e. you don’t have to worry about a query taking all resources and blocking everyone else).  There are also resource classes that allow more memory and CPU cycles to be allocated to queries run by a given user so they run faster, with the trade-off that it reduces the number of concurrent queries that can run.

A great article that discusses the various multi-tenant models in detail and how multi-tenancy is supported with Azure SQL Database is Design Patterns for Multi-tenant SaaS Applications with Azure SQL Database.

As you can see, there are lot’s of options to consider!  It becomes a balance of cost, performance, ease-of-development, east-of-use, and security.

More info:

Tips & Tricks to Build Multi-Tenant Databases with SQL Databases

Multitenancy in SQL Azure

Choosing a Multi-Tenant Data Architecture

Multi-Tenant Data Architecture

Multi-Tenant Data Isolation in SQL Azure

Multi Tenancy and Windows Azure

Posted in Azure, Azure SQL Database, Azure SQL DW, SQLServerPedia Syndication | 1 Comment

Azure SQL Data Warehouse is now GA

The Azure SQL Data Warehouse (SQL DW), that I blogged about here, is now generally available.  Here is the official announcement.

In brief, SQL DW is a fully managed data-warehouse-as-a-service that you can provision in minutes and scale up in seconds.  With SQL DW, storage and compute scale independently.  You can dynamically deploy, grow, shrink, and even pause compute, allowing for cost savings.  Also, SQL DW uses the power and familiarity of T-SQL so you can integrate query results across relational data in your data warehouse and non-relational data in Azure blob storage or Hadoop using PolyBase.  SQL DW offers an availability SLA of 99.9%, the only public cloud data warehouse service that offers an availability SLA to customers.

SQL DW uses an elastic massively parallel processing (MPP) architecture built on top of the SQL Server 2016 database engine.  It allows you to interactively query and analyze data using existing SQL-based tools and business intelligence applications.  It uses column stores for high performance analytics and storage compression, a rich collection of aggregation capabilities of SQL Server, and state of the art query optimization capabilities.

Two customer case studies using SQL DW in production were just published: AGOOP and P:Cubed.

Also note that until recently, you had to use SSDT to connect to SQL DW.  But with the July 2016 update of SSMS, you can now connect to SQL DW using SSMS (see Finally, SSMS will talk to Azure SQL DW).

More info:

Azure SQL Data Warehouse Hits General Availability

Introduction to Azure SQL Data Warehouse (video)

Microsoft Azure SQL Data Warehouse Overview (video)

Azure SQL Data Warehouse Overview with Jason Strate (video)

A Developers Guide to Azure SQL Data Warehouse (video)

Posted in Azure SQL DW, SQLServerPedia Syndication | 2 Comments

Virtualization does not equal a private cloud

I see a lot of confusion when it comes to creating a private cloud.  Many seem to think that by virtualizing your servers in your local on-premise data center, you have created a private cloud.  But by itself, virtualization falls far short of the characteristics of a private cloud and is really just server consolidation and data center automation.  The original five cloud characteristics, as stated by the National Institute of Standards and Technology (NIST), are what defines a cloud:

  1. On-demand self-service: A user can provision a virtual machine (or other resources such as storage or networks) as needed automatically without intervention from IT
  2. Broad network access: Virtualization is available for all internal customers that are on the network through various client platforms (e.g., mobile phones, tablets, laptops, and workstations)
  3. Resource pooling (multitenancy): Computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand.  There is a sense of location independence in that the customer generally has no control or knowledge over the exact location of the provided resources (storage, processing, memory, and network bandwidth) but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter)
  4. Rapid elasticity (hyper-scaling): Resources can be scaled up or down manually or in some cases automatically to commensurate with demand.  To the consumer, the capabilities available for provisioning often appear to be unlimited and can be appropriated in any quantity at any time.  There is no need for IT to provision VMs individually and install the OS and software
  5. Measured service (operations): As opposed to IT charging costs back to other departments based on traditional budgeting, costs are based on actual usage.  Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts).  Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service

Since virtualization only solves #3, a lot more should be done to create a private cloud.  Also, a cloud should also support Platform-as-a-service (PaaS) to allow for application innovation.  Fortunately there are products to add the other characteristics to give you a private cloud, such as Microsoft’s Azure Stack.  And of course you can always use a public cloud.

The main difference between a private cloud and a public cloud is in scaling.  A public cloud like Azure has no “ceiling” so resources can be added with no limits (i.e. hyper-scaling).  A private cloud has a ceiling and you may have to wait for more hardware to be installed before you can scale.  Comparing a public cloud vs a private cloud vs a hybrid approach will be the topic of another blog.

More info:

Private cloud vs. virtualization

Which Types of Workloads to Run in Public, Private Clouds

Posted in Azure, SQLServerPedia Syndication | 1 Comment

Azure SQL Data Warehouse pricing

The pricing for Azure SQL Data Warehouse (SQL DW) consists of a compute charge and a storage charge.  For compute, it is not based on hardware configuration but rather by data warehouse units (DWU).  DWU can be scaled up or down via a sliding bar in just a couple of minutes with no down time.  You pay for DWU blocks, based on up time (you can pause your SQL DW database and not have to pay for compute while paused).  When paused, storage is still available and can be used by other resources.

You must pay for storage, even when paused, but there is no limit to the amount of data you can put into storage.

Below is some examples of the compute pricing based in the East US region.  This is preview pricing and will increase when SQL DW becomes generally available.  The pricing comes from the Azure pricing calculator:

SQL Data Warehouse, 100 DWU, $521/month
SQL Data Warehouse, 500 DWU, $2,604/month
SQL Data Warehouse, 1000 DWU, $5,208/month
SQL Data Warehouse, 1500 DWU, $7,812/month
SQL Data Warehouse, 2000 DWU, $10,416/month

New data warehouses in most regions will use Premium Disk storage.  New data warehouses in Brazil South, US North Central, India West, Japan, Australia, and Europe North will continue to use Standard Disk storage.  While SQL DW remains in preview, all data warehouses (regardless of whether they use Premium or Standard disk storage) will continue to be charged at Standard Disk RA-GRS rates.  Below is some examples of the storage pricing based in the East US region:

SQL Data Warehouse, RA-GRS Page Blob, 1TB = $122/month
SQL Data Warehouse, RA-GRS Page Blob, 10TB = $1,044/month
SQL Data Warehouse, RA-GRS Page Blob, 100TB = $9,748/month
SQL Data Warehouse, RA-GRS Page Blob, 1PB = $87,572/month

Storage transactions are not billed; customers only pay for data stored, not storage transactions.  Inbound data transfers are free.  Outbound data transfers are charged at regular data transfer rates.

More info:

SQL Data Warehouse Pricing

Understanding Windows Azure Storage Billing – Bandwidth, Transactions, and Capacity

Understand your bill for Microsoft Azure

Posted in Azure SQL DW, SQLServerPedia Syndication | 1 Comment

SQL Server 2016 is here!

Today is the day: SQL Server 2016 is available for download!  You can download all the versions (enterprise, standard, web, express with advanced services, express, developer) of SQL Server 2016 now if you have a MSDN subscription, and you can also create an Azure VM right now that includes SQL Server pre-installed with one of the versions (enterprise, standard, web, express).  Lastly, you can also experience the full features through the free evaluation edition (180 days) or the developer edition (you have to sign in to Visual Studio Dev Essentials, a free developer program, before you can download the developer edition).

Here is a quick overview of the tons of new features, broken out by edition (click for larger view):

sql2016-a

and here is another view on the features available for each edition (click for larger view):

sql2016-b

More info:

SQL Server 2016 is generally available today

SQL Server 2016 e-book

Posted in SQL Server 2016, SQLServerPedia Syndication | 1 Comment

Azure Storage pricing

Azure storage is a great, inexpensive solution for storing your data in the cloud.  There are many types of Azure storage options, but I wanted to list here the pricing of the most common type: block blob.

The pricing below is based on the East US region (most other regions have very similar pricing).  The pricing comes from the Azure pricing calculator.  The data redundancy options LRS, ZRS, GRS, and RA-GRS are explained at Redundancy Options in Azure Blob Storage, but here is a review:

LOCALLY REDUNDANT STORAGE (LRS) ZONE REDUNDANT STORAGE (ZRS) GEOGRAPHICALLY REDUNDANT STORAGE (GRS) READ-ACCESS GEOGRAPHICALLY REDUNDANT STORAGE (RA-GRS)
Makes multiple synchronous copies of your data within a single datacenter Stores three copies of data across multiple datacenters within or across regions. For block blobs only. Same as LRS, plus multiple asynchronous copies to a second datacenter hundreds of miles away Same as GRS, plus read access to the secondary datacenter

Here is the pricing:

Storage, LRS, 1GB = $.02/month
Storage, LRS, 10GB = $.24/month
Storage, LRS, 100GB = $2.40/month
Storage, LRS, 1TB = $24/month
Storage, LRS, 10TB = $242/month
Storage, LRS, 100TB = $2,396/month
Storage, LRS, 1PB = $23,572/month

Storage, ZRS, 1GB = $.03/month
Storage, ZRS, 10GB = $.30/month
Storage, ZRS, 100GB = $3.00/month
Storage, ZRS, 1TB = $30/month
Storage, ZRS, 10TB = $302/month
Storage, ZRS, 100TB = $2,995/month
Storage, ZRS, 1PB = $29,466/month

Storage, GRS, 1GB = $.05/month
Storage, GRS, 10GB = $.48/month
Storage, GRS, 100GB = $4.80/month
Storage, GRS, 1TB = $49/month
Storage, GRS, 10TB = $484/month
Storage, GRS, 100TB = $4,793/month
Storage, GRS, 1PB = $47,145/month

Storage, RA-GRS, 1GB = $.06/month
Storage, RA-GRS, 10GB = $.61/month
Storage, RA-GRS, 100GB = $6.10/month
Storage, RA-GRS, 1TB = $62/month
Storage, RA-GRS, 10TB = $614/month
Storage, RA-GRS, 100TB = $6,083/month
Storage, RA-GRS, 1PB = $59,852/month

The other cost to note is you are charged for the number of storage transactions, but this is extremely cheap (transactions include both read and write operations to storage):

100k, .01/month
1M, .04/month
10M, .36/month
100M, $3.60/month

Each and every REST call to Windows Azure Blobs counts as one transaction (see Understanding Windows Azure Storage Billing – Bandwidth, Transactions, and Capacity).

Data going into an Azure storage is free (“Inbound data transfers”).  Data going out of an Azure storage is free (“Outbound data transfers”) if within the same Azure data center (“region”), otherwise there is a small cost (see Data Transfers Pricing Details):

OUTBOUND DATA TRANSFERS ZONE 1 ZONE 2 ZONE 3
First 5 GB /Month Free Free Free
5 GB – 10 TB  /Month $0.087 per GB $0.138 per GB $0.181 per GB
Next 40 TB
(10 – 50 TB) /Month
$0.083 per GB $0.135 per GB $0.175 per GB
Next 100 TB
(50 – 150 TB) /Month
$0.07 per GB $0.13 per GB $0.17 per GB
Next 350 TB
(150 – 500 TB) /Month
$0.05 per GB $0.12 per GB $0.16 per GB
  • Zone 1: US West, US East, US North Central, US South Central, US East 2, US Central, Europe West, Europe North
  • Zone 2: Asia Pacific East, Asia Pacific Southeast, Japan East, Japan West, Australia East, Australia Southeast
  • Zone 3: Brazil South

So in summary, your total cost depends on how much you store, the volume and type of storage transactions and outbound data transfers, and which data redundancy option you choose.

Note that there is a new feature for Azure blob storage: hot access tier and cold access tier (see New Azure storage account type).  The hot access tier pricing is as listed above.  The cold access tier has greatly reduced pricing (ZRS is not yet supported).  For example:

Storage, LRS, 100GB = $2.40/month (hot), $1.00/month (cold)
Storage, GRS, 100GB = $4.80/month (hot), $2.00/month (cold)
Storage, RA-GRS, 100GB = $6.10/month (hot), $2.50/month (cold)

Finally, there is a Azure Speed Testing site that you can use to determine the closest data center to your location to provide the fastest network latency.

More info:

Azure Storage Pricing

Windows Azure Storage Transaction | Unveiling the Unforeseen Cost and Tips to Cost Effective Usage

Understand your bill for Microsoft Azure

Posted in Azure, SQLServerPedia Syndication | 1 Comment