The Azure Data Factory is a service designed to allow developers to integrate disparate data sources. It is a platform somewhat like SSIS in the cloud to manage the data you have both on-prem and in the cloud.
It provides access to on-premises data in SQL Server and cloud data in Azure Storage (Blob and Tables) and Azure SQL Database. Access to on-premises data is provided through a data management gateway that connects to on-premises SQL Server databases.
It is not a drag-and-drop interface like SSIS. Instead, data processing is enabled initially through Hive, Pig and custom C# activities. Such activities can be used to clean data, mask data fields, and transform data in a wide variety of complex ways.
You will author your activities, combine them into a pipeline, set an execution schedule and you’re done. Data Factory also provides an up-to-the moment monitoring dashboard, which means you can deploy your data pipelines and immediately begin to view them as part of your monitoring dashboard.
Within the Azure Preview Portal, you get a visual layout of all of your pipelines and data inputs and outputs. You can see all the relationships and dependencies of your data pipelines across all of your sources so you always know where data is coming from and where it is going. You get a historical accounting of job execution, data production status, and system health in a single monitoring dashboard.
Data Factory provides customers with a central place to manage their processing of web log analytics, click stream analysis, social sentiment, sensor data analysis, geo-location analysis, etc.
Microsoft views Data Factory as a key tool for customers who are looking to have a hybrid story with SQL Server or who currently use Azure HDInsight, Azure SQL Database, Azure Blobs, and Power BI for Office 365.
In short, developers can use Data Factory to transform semi-structured, unstructured and structured data from on-premises and cloud sources into trusted information.