In SQL Server, scaling out reads (i.e. using Active secondary replicas via AlwaysOn Availability Groups) is a lot easier than scaling out writes. So what are your options when you have a tremendous amount of writes that scaling up will not handle, no matter how big your server is? There are a number of options that allow you to write to many servers (instead of writing to one master server) that I’ll call distributed writes. Here are some ideas:
- Peer-to-Peer transactional replication (or Multi-master replication) with SQL Server. See Peer-to-Peer – Transactional Replication
- Sharding in Azure SQL Database via elastic database tools which requires coding. See Building scalable cloud databases. You can also implement sharding in code for SQL Server
- Merge replication in SQL Server. See Merge Replication
- Create a messaging and queuing application in SQL Server Service Broker where all writes are placed on the queue and sent to different servers
- Create a message queue using an asynchronous Azure Event Hub
- Use a 3rd party product: Attunity Replicate for SQL Server, ScaleArc for SQL Server
- Instead of using SQL Server, use a NoSQL or multi-model database service like Azure Cosmos DB (no coding involved – think of it as auto-sharding)
The one option out of all the above options that does not require coding and can support a large number of writes per second is Azure Cosmos DB. All the other options can require significant coding and/or can only handle a limited amount of writes per second. This is because Cosmos DB uses documents (JSON files) where all the information needed is included in that document so no joins are needed and documents can be spread on multiple servers (see Partition and scale in Azure Cosmos DB and A technical overview of Azure Cosmos DB). This is opposed to relational databases that use multiple tables that must be joined. If the tables are on different nodes that will cause a lot of data shuffling causing performance problems.
To go into greater detail on the benefits of Cosmos DB over SQL Server for distributed writes:
- Peer2Peer SQL Replication introduces pains around data consistency and conflict resolution
- Sharding with SQL introduces pains around maintaining availability when increasing/decreasing the degree of scale-out. Frequently, downtime is involved due to needs to re-balancing data across shards
- SQL requires rigid schemas and indices to be defined upfront. Every time schema and index updates are needed – you will incur a heavy operational cost of running Create Index and Alter Tables scripts across all database shards and replicas. Furthermore, this introduces availability issues as schemas are being altered.
- Handling sustained heavy write ingestion
- Putting queueing mechanisms in front of SQL only gives you a buffer for handling spikes in writes, but at the end of the day, the database itself needs to support sustained heavy write ingestion in order to consume the buffered events. What happens if events come in to the buffer faster than you drain it? You will need a database specifically designed for heavy write ingestion
Azure Cosmos DB solves these by:
- Providing 5 well-defined consistency models to help developers tune the right Consistency vs Performance tradeoffs for their scenario
- Scale on demand and support for flexible data model while maintaining high availability (99.99% availability SLA). Scaling out and partition management is taken care of by the service on behalf of the user
- Use of log-structured techniques to be a truly latch-free database to sustain heavy write ingestion with durable persistence
In the end, eliminating schema, index management, and JOINs are a necessary byproduct of scale out that Azure Cosmos DB provides.
After the initial post of this blog, I received the question “Why not just use SQL 2016 in-Memory tables for heavy write systems (info)?” and received a great reply from a Cosmos DB product manager:
SQL in-memory is only viable when:
- Request and data volume are small enough to fit on a single machine. You still have the fundamental problem of hard limits due to scale-up model.
- Scenario does not need durability, reliability, or availability – which are requirements for >99% of mission critical customer scenarios.
- If data is kept in only in-memory, you experience data loss upon any intermittent failure that requires computer to restart (e.g. os crash, power outage, os decides it wants to reboot to update, etc.). In order for data to be durable, it needs to be persisted to disk. In order to offer resiliency against disk failures, you will need to replicate to a number of disks
- For durable scenarios – memory only acts as a buffer to absorb spikes. In order to achieve sustained write ingestion – you will need to flush the buffer as fast as you input into the buffer. Now you have a bottleneck on disk i/o unless you scale-out
- This is why they immediately have to address that this is for “applications where durability is not required”; durability is a requirement for >99% of data scenarios
- Data loss and data corruption should be treated as cardinal sin for any data product
- This model is still a scale-up model – in which there are many hard limits
- What happens for data volume that doesn’t fit in memory (which tends to be very small size relative to disk storage)? You need to scale-out
- What happens for request volume that memory bandwidth is inadequate? You need to scale out
- This is why the throughput numbers in the blog are orders of magnitude smaller than what customers are doing everyday on Cosmos DB, and talking about storage size is quietly ignored
- Memory is 100x more expensive than SSD. Achieving high storage in a scale-out system will yield not only better scale and durability characteristics – but incur much lower costs for any large-scale scenarios