So you have data in Azure Blob Storage and are concerned about reliability. Have no fear! There are three replication options for redundancy:
1. Locally Redundant Storage (LRS): All data in the storage account is made durable by replicating transactions synchronously to three different storage nodes within the same region (there are 24 regions throughout the world, which each region made up of multiple datacenters).
2. Geo Redundant Storage (GRS): This is the default option for redundancy when a storage account is created. Like LRS, transactions are replicated synchronously to three different storage nodes within the same primary region chosen for creating the storage account. However, transactions are also queued for asynchronous replication to another secondary region that is hundreds of miles away from the primary (geo-replication). In this secondary region the data is again made durable by replicating it to three more storage nodes there (i.e. total of 6 copies). So even in the case of a complete regional outage or a regional disaster in which the primary location is not recoverable, your data is still durable.
3. Read Access – Geo Redundant Storage (RA-GRS): For a GRS storage account, there is the ability to turn on read-only access to a storage account’s data in the secondary region. Since replication to the secondary region is done asynchronously, this provides an eventual consistent version of the data to be read from. When you enable read-only access to your secondary region, you get a secondary endpoint in addition to the primary endpoint for accessing your storage account. This secondary endpoint is similar to the primary endpoint except for the suffix “-secondary”. For example: if the primary endpoint is myaccount.<service>.core.windows.net, the secondary endpoint is myaccount-secondary.<service>.core.windows.net.
These options mean the data in your Microsoft Azure storage account is always replicated to ensure durability and high availability, meeting the Azure Storage SLA even in the face of transient hardware failures.
For locally redundant storage, Microsoft stores CRCs of the data to ensure correctness and periodically reads and validates the CRCs to detect bit rot (random errors occurring on the disk media over a period of time). If CRC fails, the data is recovered via an automated process. And since each VM disk is a blob in Azure storage, if CRC fails on a disk it is automatically commissioned/decommissioned.
For remote storage (GRS and RA-GRS), in the event of a major disaster that affects the primary storage location, Microsoft will manually first try to restore the primary location. Restoring of primary is given precedence since failing over to secondary may result in recent delta changes being lost because of the nature of replication being asynchronous, and not all applications may prefer failing over if the availability to the primary can be restored. Dependent upon the nature of the disaster and its impacts, in some very rare occasions, Microsoft may not be able to restore the primary location, and they would need to perform a geo-failover.
When this happens, affected customers will be notified via their subscription contact information or via the Azure portal. As part of the failover, the customer’s “account.service.core.windows.net” DNS entry would be updated to point from the primary location to the secondary location. Once this DNS change is propagated, the existing Blob URIs will work. This means that you do not need to change your application’s URIs – all existing URIs will work the same before and after a geo-failover.
After the failover occurs, the location that is accepting traffic is considered the new primary location for the storage account. This location will remain as the primary location unless another geo-failover was to occur. Once the new primary is up and accepting traffic, Microsoft will bootstrap to a new secondary to get the data geo redundant again.