Getting data into Azure Blob Storage

If you have on-prem data and want to copy it to Azure Blob Storage in the cloud, what are all the possible ways to do it?  There are many, and here is a quick review of them:

AzCopy: A popular command-line utility designed for high-performance uploading, downloading, and copying data to and from Microsoft Azure Blob Storage.  See Getting Started with the AzCopy Command-Line Utility

Azure Import/Export Service: Used to transfer large amounts of file data to Azure Blob storage in situations where uploading over the network is prohibitively expensive or not feasible by sending one or more hard drives containing that data to an Azure data center.  See Use the Microsoft Azure Import/Export Service to Transfer Data to Blob Storage

SSIS: The Microsoft SQL Server 2014 Integration Services (SSIS) Feature Pack for Azure provides SSIS the capability of connecting to Azure Blob Storage.  It enables you to create SSIS packages that transfer data between an Azure Blob Storage and on-premises data sources.  See Microsoft SQL Server 2014 Integration Services Feature Pack for Azure and SSIS Feature Pack for Azure

Azure Data Factory (ADF): With the latest ADF service update and Data Management Gateway release, you can copy from on-premises file system and SQL Server to Azure Blob.  See Azure Data Factory Update – New Data Stores and Move data to and from Azure Blob using Azure Data Factory and Move data to and from SQL Server on-premises or on IaaS (Azure VM) using Azure Data Factory.  UPDATE: Released on March 18th was a Copy Wizard within ADF that gives you an interactive data movement experience to easily move data between Azure Blob Storage, Azure SQL Database, Azure SQL Data Warehouse, On-Premises SQL Server, Azure Data Lake, Oracle, MySQL, DB2, Sybase, PostgreSql and Teradata using a simple and code free wizard.  It supports both one-time and scheduled copy operations.

FTP: Deployed in an Azure worker role, this code creates an FTP server that can accept connections from all popular FTP clients (like FileZilla, for example) for command and control of your blob storage account.  See FTP to Azure Blob Storage Bridge

Other Command-line Utilities: See Azure Command-Line Interface (CLI), CloudCopy Command Line Tool

Graphical Clients: Windows Azure Storage explorers that can be used to enumerate and/or transfer data to and from blobs.  See Azure Storage Explorer, Blob Transfer Utility for Windows Azure Blob Storage, CloudBerry Explorer.  For more, see Windows Azure Storage Explorers

PowerShell/Cmdlets, .NET SDK, .NET Azure Storage Client, JavaScript CLI: Work with Azure storage programmatically.  See Uploading data with Windows PowerShell and Uploading data with the Microsoft .NET Framework and Uploading data with the Azure Storage SDK and Using the Azure CLI with Azure Storage

Signiant Flight: An easy-to deploy 3rd-party solution that accelerates the movement of large data sets in and out of Azure BLOB storage.  During a test with Signiant I witnessed 4x performance gain than Azcopy.  Just the fact that it uses advanced UDP acceleration which is magnitudes faster than a TCP based transfer.

Azure Portal: Allows downloading, deleting and editing certain properties of Blob files.  For the old portal, this is done by going to the Azure Portal, choosing Storage, clicking on the Storage Name, and clicking Containers.  Then click on a container and you will then see a list of the Blob files in that container and at the bottom is options to Download, Edit, and Delete.  For the new portal, this is done by going to the Azure Portal, choosing Storage accounts, clicking on the storage account, and clicking on the Blobs service.  Then click on a container and you will then see a list of the Blob files in that container.  Click on one of the files and you will see options to Download and Delete.

PolyBase: A technology that allows SQL to be used across relational data stores and non-relational Hadoop data.  See APS Polybase for Hadoop and Windows Azure Blob Storage (WASB) Integration.  This is to move data to/from APS or SQL DW or SQL Server 2016 and Azure Blob Storage

Note that for on-prem to cloud transfers, there is a service call Azure ExpressRoute to give you a faster pipe.

If you have on-prem data that you want to copy to the Azure Data lake, the above technologies that can accomplish this so far are: SSIS (via a custom task), ADF, PowerShell/Cmdlets.NET SDKJavaScript CLI, or the Azure Portal.

More info:

Upload data for Hadoop jobs in HDInsight

Migrate Your Data

Migrating data to Azure SQL Data Warehouse in practice

About James Serra

James is a big data and data warehousing solution architect at Microsoft. Previously he was an independent consultant working as a Data Warehouse/Business Intelligence architect and developer. He is a prior SQL Server MVP with over 25 years of IT experience.
This entry was posted in Azure, Azure SQL DW, Data Lake, PolyBase, SQLServerPedia Syndication. Bookmark the permalink.