Multi-tenant databases in the cloud

For companies that sell an on-prem software solution and are looking to move that solution to the cloud, a challenge arises on how to architect that solution in the cloud. For example, say you have a software solution that stores patient data for hospitals. You sign up hospitals, install the hardware and software and the associated databases on-prem (at the hospital or a co-location facility), and load their patient data. Think of each hospital as a “tenant”. Now you want to move this solution to the cloud and get the many benefits that come with it, the biggest being the time to get a hospital up and running, which can go from months on-prem to hours in the cloud. Now you have some choices: keep each hospital separate with their own VMs and databases (“single tenant”), or combining the data for each hospital into one database (“multi-tenant”). For another example, you can simply be creating a PaaS application similar to Salesforce. Here I’ll describe the various cloud strategies using Azure SQL Database, which is for OLTP applications, and Azure SQL Data Warehouse, which is for OLAP applications (see Azure SQL Database vs SQL Data Warehouse):

Separate Servers\VMs

You create VMs for each tenant, essentially doing a “lift and shift” of the current on-premise solution. This provides the best isolation possible and it’s regularly done on-premises, but it’s also the one that doesn’t enable cutting costs, since each tenant has it’s own server, sql, license and so on. Sometimes this is the only allowable option if you have in your client contract that their data will be virtual machine-isolated from other clients. Some cons: table updates must be replicated across all the servers (i.e. updating reference tables), there is no resource sharing, and you need multiple backup strategies across all the servers.

Separate Databases

A new database is created and assigned when a tenant is provisioned. You can land a number of the databases on each VM (i.e. each VM handles ten tenants), or create a database using Azure SQL Database. This is often used if you need to provide isolation for each customer, because we can associate different logins, permissions and so on to each database. If using Azure SQL Database, be aware the database size limit is 4TB. If you have a client database that will exceed that, you can use sharding (via Elastic Database Tools) or use cross-database queries (see Scaling Azure SQL Database and Cross-database queries in Azure SQL Database) with row-level security (see Multi-tenant applications with elastic database tools and row-level security). The lower service tier for SQL Database has a max database size of 5GB, so you might be paying for storage that you don’t really use. If using Azure SQL Data Warehouse, you have no limit on database size. Some other cons: A different connection pool is required per database, updates must be replicated across all the databases, there is no resource sharing (unless using Elastic Database Pools) and you need multiple backup strategies across all the databases.

Separate Schemas

Also a very good way to achieve multi-tenancy but at the same time share some resources since everything is inside the same database, but the schemas used are different, having a separate schema for each tenant. That allows you to even customize a specific tenant without affecting others. And you save costs by only paying for one database (which can fit on SQL Data Warehouse not matter what the size) or a handful of databases if using SQL Database (i.e. ten tenants per database). Some of the cons: You need to replicate all the database objects in every schema, so the number of objects can increase indefinitely, updates must be replicated across all the schemas, the connection pool for the database must maintain a different connection per tenant (or set of credentials), a different user is required per tenant (which is stored at server level) and you have to backup that user independently.

A variation of this using SQL Database is to split the tenants over multiple databases, but not to use separate schemas for performance reasons. The is done by assigning a distinct set of tenants to each database using a partitioning strategy such as hash, range or list partitioning. This data distribution strategy is oftentimes referred to as sharding.

Row Isolation

Everything is shared in this option, server, database and even schema. All the data for the tenants are within the same tables in one database. The only way they are differentiated is based on a TenantId or some other column that exists on the table level. Another big benefit is code changes: with this option you only have one spot to change code (i.e. table structure). With the other options you will have to roll out code changes to many spots. You will need to use row-level security or something similar when you need to limit the results to an individual tenant. Or you can create views or use stored procedures to filter tenants. You also have the benefit of ease-of-use and performance when you need to aggregate results over multiple tenants. Azure SQL Data Warehouse is a great solution for this, as there is no limit to the database size.

But be aware that there is a limit of 32 concurrent queries and 1,024 concurrent connections, so if you have thousands of users who will be hitting the database at the same time, you may want to create data marts in Azure SQL Database or create SSAS cubes. This was a limit imposed since there is no resource governor or CPU query scheduler like there is in SQL Server. But the benefit is each query gets its own resources and it won’t affect other queries (i.e. you don’t have to worry about a query taking all resources and blocking everyone else). There are also resource classes that allow more memory and CPU cycles to be allocated to queries run by a given user so they run faster, with the trade-off that it reduces the number of concurrent queries that can run.

A great article that discusses the various multi-tenant models in detail and how multi-tenancy is supported with Azure SQL Database is Design Patterns for Multi-tenant SaaS Applications with Azure SQL Database.

As you can see, there are lot’s of options to consider! It becomes a balance of cost, performance, ease-of-development, east-of-use, and security. Performance is easier to deal with when there is one database per customer, but management become more difficult.

More info:

Tips & Tricks to Build Multi-Tenant Databases with SQL Databases

Multitenancy in SQL Azure

Choosing a Multi-Tenant Data Architecture

Multi-Tenant Data Architecture

Multi-Tenant Data Isolation in SQL Azure

Multi Tenancy and Windows Azure

Multi-Tenancy with SQL Server, Part 2: Database Design Approaches

How to Design Multi-Client Databases