Distributed Writes

In SQL Server, scaling out reads (i.e. using Active secondary replicas via AlwaysOn Availability Groups) is a lot easier than scaling out writes.  So what are your options when you have a tremendous amount of writes that scaling up will not handle, no matter how big your server is?  There are a number of options that allow you to write to many servers (instead of writing to one master server) that I’ll call distributed writes.  Here are some ideas:

The one option out of all the above options that does not require coding and can support a large number of writes per second is Azure Cosmos DB.  All the other options can require significant coding and/or can only handle a limited amount of writes per second.  This is because Cosmos DB uses documents (JSON files) where all the information needed is included in that document so no joins are needed and documents can be spread on multiple servers (see Partition and scale in Azure Cosmos DB and A technical overview of Azure Cosmos DB).  This is opposed to relational databases that use multiple tables that must be joined.  If the tables are on different nodes that will cause a lot of data shuffling causing performance problems.

To go into greater detail on the benefits of Cosmos DB over SQL Server for distributed writes:

  • Consistency
    • Peer2Peer SQL Replication introduces pains around data consistency and conflict resolution
  • Availability
    • Sharding with SQL introduces pains around maintaining availability when increasing/decreasing the degree of scale-out.  Frequently, downtime is involved due to needs to re-balancing data across shards
    • SQL requires rigid schemas and indices to be defined upfront.  Every time schema and index updates are needed – you will incur a heavy operational cost of running Create Index and Alter Tables scripts across all database shards and replicas.  Furthermore, this introduces availability issues as schemas are being altered.
  • Handling sustained heavy write ingestion
    • Putting queueing mechanisms in front of SQL only gives you a buffer for handling spikes in writes, but at the end of the day, the database itself needs to support sustained heavy write ingestion in order to consume the buffered events.  What happens if events come in to the buffer faster than you drain it?  You will need a database specifically designed for heavy write ingestion

Azure Cosmos DB solves these by:

  • Providing 5 well-defined consistency models to help developers tune the right Consistency vs Performance tradeoffs for their scenario
  • Scale on demand and support for flexible data model while maintaining high availability (99.99% availability SLA).  Scaling out and partition management is taken care of by the service on behalf of the user
  • Use of log-structured techniques to be a truly latch-free database to sustain heavy write ingestion with durable persistence

In the end, eliminating schema, index management, and JOINs are a necessary byproduct of scale out that Azure Cosmos DB provides.

After the initial post of this blog, I received the question “Why not just use SQL 2016 in-Memory tables for heavy write systems (info)?” and received a great reply from a Cosmos DB product manager:

SQL in-memory is only viable when:

  • Request and data volume are small enough to fit on a single machine.  You still have the fundamental problem of hard limits due to scale-up model.
  • Scenario does not need durability, reliability, or availability – which are requirements for >99% of mission critical customer scenarios.

Durability

  • If data is kept in only in-memory, you experience data loss upon any intermittent failure that requires computer to restart (e.g. os crash, power outage, os decides it wants to reboot to update, etc.).  In order for data to be durable, it needs to be persisted to disk.  In order to offer resiliency against disk failures, you will need to replicate to a number of disks
  • For durable scenarios – memory only acts as a buffer to absorb spikes.  In order to achieve sustained write ingestion – you will need to flush the buffer as fast as you input into the buffer.  Now you have a bottleneck on disk i/o unless you scale-out
  • This is why they immediately have to address that this is for “applications where durability is not required”; durability is a requirement for >99% of data scenarios
  • Data loss and data corruption should be treated as cardinal sin for any data product

Scale

  • This model is still a scale-up model – in which there are many hard limits
  • What happens for data volume that doesn’t fit in memory (which tends to be very small size relative to disk storage)?  You need to scale-out
  • What happens for request volume that memory bandwidth is inadequate?  You need to scale out
  • This is why the throughput numbers in the blog are orders of magnitude smaller than what customers are doing everyday on Cosmos DB, and talking about storage size is quietly ignored

Expensive

  • Memory is 100x more expensive than SSD.  Achieving high storage in a scale-out system will yield not only better scale and durability characteristics – but incur much lower costs for any large-scale scenarios

More info:

Database Sharding

About James Serra

James is a big data and data warehousing solution architect at Microsoft. Previously he was an independent consultant working as a Data Warehouse/Business Intelligence architect and developer. He is a prior SQL Server MVP with over 25 years of IT experience.
This entry was posted in SQL Server, SQLServerPedia Syndication. Bookmark the permalink.

9 Responses to Distributed Writes

  1. JRStern says:

    Very thought provoking, thanks. Y’know, it occurs to me that another strategy is possible, which is writing much of your transaction to a simplified write cache table instead of spreading it across ten normalized tables. That way you stay within a primary Azure SQL database but the simplified logic gives you more throughput. You clean it up minutes or hours later, as the case may be.

    It does seem that SQL Server / Azure SQL has some absolute bottlenecks on writes having to do with logging, and on heavy write systems these can become an issue, but the numbers are not well documented, especially on newer systems using SSDs.

    • James Serra says:

      Glad it was thought provoking, as that was my main goal with the post. Your idea is similar to the queue’s I mentioned, which have the problem of a limit on how much the queue can hold plus the delay in “cleaning up the queue”. Thanks for your comment!

    • JRStern says:

      I’ve been very disappointed with all my proof of concept attempts to use in-memory tables. The speed increase has been very modest, and the bother to avoid the various limitations much larger than I dreamed.

      I’m not even clear on whether using memory tables avoids any bottlenecks on logging, only on delayed persistence of the main db.

    • James Serra says:

      I received a great reply to this question from a Cosmos DB product manager (and have added to the blog post):

      SQL in-memory is only viable when:
      • Request and data volume are small enough to fit on a single machine. You still have the fundamental problem of hard limits due to scale-up model.
      • Scenario does not need durability, reliability, or availability – which are requirements for >99% of mission critical customer scenarios.

      Durability
      • If data is kept in only in-memory, you experience data loss upon any intermittent failure that requires computer to restart (e.g. os crash, power outage, os decides it wants to reboot to update, etc.). In order for data to be durable, it needs to be persisted to disk. In order offer resiliency against disk failures, you will need to replicate to a number of disks.
      • For durable scenarios – memory only acts as a buffer to absorb spikes. In order to achieve sustained write ingestion – you will need to flush the buffer as fast as you input into the buffer. Now you have a bottleneck on disk i/o unless you scale-out.
      • This is why they immediately have to address that this is for “applications where durability is not required”; durability is a requirement for >99% of data scenarios.
      • Data loss and data corruption should be treated as cardinal sin for any data product.
      Scale
      • This model is still a scale-up model – in which there hard many hard limits.
      • What happens for data volume that doesn’t fit in memory (which tends to be very small size relative to disk storage)? You need to scale-out.
      • What happens for request volume that memory bandwidth is inadequate? You need to scale out.
      • This is why the throughput numbers in the blog are orders of magnitude smaller than what customers are doing everyday on Cosmos DB, and talking about storage size is quietly ignored.
      Expensive
      • Memory is 100x more expensive than SSD. Achieving high throughput in a scale-out system will yield not only better scale and durability characteristics – but incur much lower costs for any large-scale scenarios.

      • JRStern says:

        So then is the idea of Cosmos DB that it helps to scale in-memory? I had no idea that was the point. But what about when it’s then time to query the data, does it then also need to be persisted back to a relational database? Or does it all have to be memory resident across N servers in Cosmos DB? I think Microsoft has not made any of this clear – certainly not to me! And I did look, somewhat, a few months ago.

        I also looked back at the link SThomms posted, and is SCM mentioned actually available? I don’t think it is, Intel has pushed off theirs in Purley until next year and I’ve heard nothing about the non-Intel alternatives.

        • James Serra says:

          To clarify – the previous comment illustrates why SQL’s in-memory solution is not a general-purpose solution to the problem context for this blog post (how to build a general-purpose solution to support a sustained high rate of write ingestion).

          Azure Cosmos DB is a general-purpose distributed database, that happens to address supporting a sustained high rate of ingestion incredibly well. In fact, Azure Cosmos DB had its birth out of the same product engineering team as SQL – in which one of the goals for the team was to solve the scale-out problem more holistically (as opposed to addressing scale-out as an afterthought as has been done in the traditional RDBMS space). By getting a clean slate to re-design our solution, we were freed from the burdens of having to support a legacy design. To read more, check out this blog post: https://azure.microsoft.com/en-us/blog/a-technical-overview-of-azure-cosmos-db/

  2. Pingback: Distributed Database Writes – Curated SQL

  3. Pingback: Use cases of various products for a big data cloud solution | James Serra's Blog