Analytics Platform System (APS) AU7 released

The Analytics Platform System (APS), which is a renaming of the Parallel Data Warehouse (PDW), has just released an appliance update (AU7), which is sort of like a service pack, except that it includes many new features.

Below is what is new in this release:

Customers will get significantly improved query performance and enhanced security features with this release. APS AU7 builds on appliance update 6 (APS 2016) release as a foundation. Upgrading to APS appliance update 6 is a prerequisite to upgrade to appliance update 7.

Faster performance

APS AU7 now provides the ability to automatically create statistics and update of existing outdated statistics for improved query optimization. APS AU7 also adds support for setting multiple variables from a single select statement reducing the number of redundant round trips to the server and improving overall query and ETL performance time. Other T-SQL features include HASH and ORDER GROUP query hints to provide more control over improving query execution plans.

Better security

APS AU7 also includes latest firmware and drivers along with the hardware and software patch to address the Spectre/Meltdown vulnerability from our hardware partners.

Management enhancements

Customers already on APS2016 will experience an enhanced upgrade process to APS AU7 allowing a shorter maintenance window with the ability to uninstall and rollback to a previous version.  AU7 also introduces a section called Feature Switch in configuration manager giving customers the ability to customize the behavior of new features.

More info:

Microsoft releases the latest update of Analytics Platform System

Posted in PDW/APS, SQLServerPedia Syndication | Leave a comment

Understanding Cosmos DB

Cosmos DB is an awesome product that is mainly used for large-scale OLTP solutions.  Any web, mobile, gaming, and IoT application that needs to handle massive amounts of data, reads, and writes at a globally distributed scale with near-real response times for a variety of data are great use cases (It can be scaled-out to support many millions of transactions per second).  Because it fits in the NoSQL category and is a scale-out solution, it can be difficult to wrap your head around how it works if you come from the relational world (i.e. SQL Server).  So this blog will be out the differences in how Cosmos DB works.

First, a quick comparison of terminology to help you understand the difference:

RDBMS Cosmos DB (Document Model) Cosmos DB (Graph Model
Database Database Database
Table, view Collection Graph
Row Document (JSON) Vertex
Column Property Property
Foreign Key Reference Edge
Join Embedded document .out()
Partition Key/Sharding Key Partition Key Partition Key

From Welcome to Azure Cosmos DB and other documentation, here are some key points to understand:

  • You can distribute your data to any number of Azure regions, with the click of a button. This enables you to put your data where your users are, ensuring the lowest possible latency to your customers
  • When a new region gets added, it is available for operations within 30 minutes anywhere in the world (assuming your data is 100 TBs or less).
  • To control exact sequence of regional failovers in cases of an outage, Azure Cosmos DB enables you to associate a priority with various regions associated with the database account
  • Azure Cosmos DB enables you to configure the regions (associated with the database) for “read”, “write” or “read/write” regions.
  • For Cosmos DB to offer strong consistency in a globally distributed setup, it needs to synchronously replicate the writes or to synchronously perform cross-region reads.  The speed of light and the wide area network reliability dictates that strong consistency will result in higher latencies and reduced availability of database operations.  Hence, in order to offer guaranteed low latencies at the 99th percentile and 99.99% availability for all single region accounts and all multi-region accounts with relaxed consistency, and 99.999% availability on all multi-region database accounts, it must employ asynchronous replication.  This in-turn requires that it must also offer well-defined, relaxed consistency model(s) – weaker than strong (to offer low latency and availability guarantees) and ideally stronger than “eventual” consistency (with an intuitive programming model)
  • Using Azure Cosmos DB’s multi-homing APIs, an app always knows where the nearest region is and sends requests to the nearest data center.  All of this is possible with no config changes.  You set your write-region and as many read-regions as you want, and the rest is handled for you
  • As you add and remove regions to your Azure Cosmos DB database, your application does not need to be redeployed and continues to be highly available thanks to the multi-homing API capability
  • It supports multiple data models, including but not limited to document, graph, key-value, table, and column-family data models
  • APIs for the following data models are supported with SDKs available in multiple languages: SQL API, MongoDB API, Cassandra APIGremlin API, Table API
  • 99.99% availability SLA for all single region database accounts, and all 99.999% read availability on all multi-region database accounts.  Deploy to any number of Azure regions for higher availability and better performance
  • For a typical 1KB item, Cosmos DB guarantees end-to-end latency of reads under 10 ms and indexed writes under 15 ms at the 99th percentile within the same Azure region.  The median latencies are significantly lower (under 5 ms).  So you will want to deploy your app and your database to multiple regions to have users all over the world have the same low latency.  If you have an app in one region but the Cosmos DB database in another, then you will have additional latency between the regions (see Azure Latency Test to determine what that latency would be, or go to see existing latency via the Azure Portal and choose Azure Cosmos DB then choose your database then choose Metrics -> Consistency -> SLA -> Replication latency)
  • Developers reserve throughput of the service according to the application’s varying load.  Behind the scenes, Cosmos DB will scale up resources (memory, processor, partitions, replicas, etc.) to achieve that requested throughput while maintaining the 99th percentile of latency for reads to under 10 ms and for writes to under 15 ms. Throughput is specified in request units (RUs) per second.  The number of RUs consumed for a particular operation varies based upon a number of factors, but the fetching of a single 1KB document by id spends roughly 1 RU.  Delete, update, and insert operations consume roughly 5 RUs assuming 1 KB documents.  Big queries and stored procedure executions can consume 100s or 1000s of RUs based upon the complexity of the operations needed.  For each collection (bucket of documents), you specify the RUs
  • Throughput directly affects how much the user is charged but can be tuned up dynamically to handle peak load and down to save costs when more lightly loaded by using the Azure Portal, one of the supported SDKs, or the REST API
  • Request Units (RU) are used to guarantee throughput in Cosmos DB.  You will pay for what you reserve, not what you use.  RUs are provisioned by region and can vary by region as a result.  But they are not shared between regions.  This will require you to understand usage patterns in each region you have a replica
  • For applications that exceed the provisioned request unit rate for a container, requests to that collection are throttled until the rate drops below the reserved level.  When a throttle occurs, the server preemptively ends the request with RequestRateTooLargeException (HTTP status code 429) and returns the x-ms-retry-after-ms header indicating the amount of time, in milliseconds, that the user must wait before reattempting the request.  So, you will get 10ms reads as long as requests stay under the set RU’s
  • Cosmos DB provides five consistency levels: strong, bounded-staleness, session, consistent prefix, and eventual.  The further to the left in this list, the greater the consistency but the higher the RU cost which essentially lowers available throughput for the same RU setting.  Session level consistency is the default.  Even when set to lower consistency level, any arbitrary set of operations can be executed in an ACID-compliant transaction by performing those operations from within a stored procedure.  You can also change the consistency level for each request using the x-ms-consistency-level request header or the equivalent option in your SDK
  • Azure Cosmos DB accounts that are configured to use strong consistency cannot associate more than one Azure region with their Azure Cosmos DB account
  • There is not support for GROUP BY or other aggregation functionality found in database systems (workaround is to use Spark to Cosmos DB connector)
  • No database schema/index management – it automatically indexes all the data it ingests without requiring any schema or indexes and serves blazing fast queries.  By default, every field in each document is automatically indexed generally providing good performance without tuning to specific query patterns.  These defaults can be modified by setting an indexing policy which can vary per field.
  • Industry-leading, financially backed, comprehensive service level agreements (SLAs) for availability, latency, throughput, and consistency for your mission-critical data
  • There is a local emulator running under MS Windows for developer desktop use (was added in the fall of 2016)
  • Capacity options for a collection: Fixed (max of 10GB and 400 – 10,000 RU/s), Unlimited (1,000 – 100,000 RU/s). You can contact support if you need more than 100,000 RU/s.  There is no limit to the total amount of data or throughput that a container can store in Azure Cosmos DB
  • Costs: SSD Storage (per GB): $0.25 GB/month; Reserved RUs/second (per 100 RUs, 400 RUs minimum): $0.008/hour (for all regions except Japan and Brazil which are more)
  • Global distribution (also known as global replication/geo-redundancy/geo-replication) is for delivering low-latency access to data to end users no matter where they are located around the globe and for adding regional resiliency for business continuity and disaster recovery (BCDR).  When you choose to make containers span across geographic regions, you are billed for the throughput and storage for each container in every region and the data transfer between regions
  • Cosmos DB implements optimistic concurrency so there are no locks or blocks but instead, if two transactions collide on the same data, one of them will fail and will be asked to retry
  • Because there is currently no concept of a constraint, foreign-key or otherwise, any inter-document relationships that you have in documents are effectively “weak links” and will not be verified by the database itself.  If you want to ensure that the data a document is referring to actually exists, then you need to do this in your application, or through the use of server-side triggers or stored procedures on Azure Cosmos DB.
  • You can set up a policy to geo-fence a database to specific regions.  This geo-fencing capability is especially useful when dealing with data sovereignty compliance that requires data to never leave a specific geographical boundary
  • Backups are taken every four hours and two are kept at all times.  Also, in the event of database deletion, the backups will be kept for thirty days before being discarded.  With these rules in place, the client knows that in the event of some unintended data modification, they have an eight-hour window to get support involved and start the restore process
  • Cosmos DB is an Azure data storage solution which means that the data at rest is encrypted by default and data is encrypted in transit.  If you need Role-Based Access Control (RBAC), Azure Active Directory (AAD) is supported in Cosmos DB
  • Within Cosmos DB, partitions are used to distribute your data for optimal read and write operations.  It is recommended to create a granular key with highly distinct values.  The partitions are managed for you.  Cosmos DB will split or merge partitions to keep the data properly distributed.  Keep in mind your key needs to support distributed writes and distributed reads
  • Until recently, writes could only be made to one region.  But now in private preview is writes to multi regions.  See Multi-master at global scale with Azure Cosmos DB.  With Azure Cosmos DB multi-master support, you can perform writes on containers of data (for example, collections, graphs, tables) distributed anywhere in the world. You can update data in any region that is associated with your database account. These data updates can propagate asynchronously. In addition to providing fast access and write latency to your data, multi-master also provides a practical solution for failover and load-balancing issues.

Azure Cosmos DB allows you to scale throughput (as well as, storage), elastically across any number of regions depending on your needs or demand.

Azure Cosmos DB distributed and partitioned collections

The above pictures shows a single Azure Cosmos DB container horizontally partitioned (across three resource partitions within a region) and then globally distributed across three Azure regions

An Azure Cosmos DB container gets distributed in two dimensions (i) within a region and (ii) across regions. Here’s how (see Partition and scale in Azure Cosmos DB for more info):

  • Local distribution: Within a single region, an Azure Cosmos DB container is horizontally scaled out in terms of resource partitions.  Each resource partition manages a set of keys and is strongly consistent and highly available being physically represented by four replicas also called a replica-set and state machine replication among those replicas.  Azure Cosmos DB is a fully resource-governed system, where a resource partition is responsible to deliver its share of throughput for the budget of system resources allocated to it.  The scaling of an Azure Cosmos DB container is transparent to the users.  Azure Cosmos DB manages the resource partitions and splits and merges them as needed as storage and throughput requirements change
  • Global distribution: If it is a multi-region database, each of the resource partitions is then distributed across those regions.  Resource partitions owning the same set of keys across various regions form a partition set (see preceding figure).  Resource partitions within a partition set are coordinated using state machine replication across multiple regions associated with the database.  Depending on the consistency level configured, the resource partitions within a partition set are configured dynamically using different topologies (for example, star, daisy-chain, tree etc.)

The following links can help with understanding the core concepts better: Request units in Azure Cosmos DBPerformance tips for Azure Cosmos DB and .NETTuning query performance with Azure Cosmos DBPartitioning in Azure Cosmos DB using the SQL APILeverage Azure CosmosDB metrics to find issues.

You can Try Azure Cosmos DB for Free without an Azure subscription, free of charge and commitments.  For a good training course on Cosmos DB check out Developing Planet-Scale Applications in Azure Cosmos DB.

More info:

Relational databases vs Non-relational databases

A technical overview of Azure Cosmos DB

Posted in Azure Cosmos DB, SQLServerPedia Syndication | 1 Comment

Getting value out of data quickly

There are times when you need to create a “quick and dirty” solution to build a report.  This blog will show you one way of using a few Azure products to accomplish that.  This should not be viewed as a replacement for a data warehouse but rather as a way to quickly show a customer how to get value out of their data or if you need a one-time report or if you just want to see if certain data would be useful to move into your data warehouse.

Let’s look at a high-level architecture for building a report quickly using NCR data (restaurant data):

This solution has the restaurant data that is in on-prem SQL Server replicated to Azure SQL Database using transactional replicationAzure Data Factory is then used to copy the point-of-sale transactions logs in Azure SQL Database into Azure Data Lake Store.  Then Azure Data Lake Analytics with U-SQL is used to transform/clean the data and store it back into Azure Data Lake Store.  That data is then used in Power BI to create the reports and dashboards (business users can build the models in Power BI and the data can be refreshed multiple times during the day via the new incremental refresh).  This is all done with Platform-as-a-Service products so there is nothing to setup and install and no VMs – just quickly and easily doing all the work via the Azure portal.

This solution is inexpensive since there is no need for the more expensive services like Azure SQL Data Warehouse or Azure Analysis Services, and Azure Data Lake Analytics is a job service that you only pay for when the query runs (where you specify the account units to use).

Some things to keep in mind with a solution like this:

  • Power BI has been called “reporting crack” because once a business user is exposed to it they want more.  And this solution gives them their first taste
  • This solution should have a very limited scope – it’s more like a proof-of-concept and should be a short-term solution
  • It takes the approach of ELT instead of ETL in that data is loaded into Azure Data Lake Store and then converted using the power of Azure Data Lake Analytics instead of it being transformed during the move from the source system to the data lake like you usually do when using SSIS
  • This limits the data model building to one person using it for themselves or a department verses have multiple people build models for an enterprise solution using Azure Analysis Services
  • This results in quick value but sacrifices an enterprise solution that includes performance, data governance, data history, referential integrity, security, and master data management.  Also, you will not be able to use tools that need to work against a relational format
  • This solution will normally require a power user to develop reports since it’s working against a data lake instead of a easier-to-use relational model or a tabular model

An even better way to get value of of data quickly is with another product that is in preview called Common Data Service for Analytics.  More on this in my next blog.

Posted in SQLServerPedia Syndication | 2 Comments

Microsoft Build event announcements

Another Microsoft event and another bunch of exciting announcements.  At the Microsoft Build event this week, the major announcements in the data platform space were:

Multi-master at global scale with Azure Cosmos DB.  Perform writes on containers of data (for example, collections, graphs, tables) distributed anywhere in the world. You can update data in any region that is associated with your database account. These data updates can propagate asynchronously. In addition to providing fast access and write latency to your data, multi-master also provides a practical solution for failover and load-balancing issues.  More info

Azure Cosmos DB Provision throughput at the database level in preview.  Azure Cosmos DB customers with multiple collections can now provision throughput at a database level and share throughput across the database, making large collection databases cheaper to start and operate.  More info

Virtual network service endpoint for Azure Cosmos DB.  Generally available today, virtual network service endpoint (VNET) helps to ensure access to Azure Cosmos DB from the preferred virtual network subnet.  The feature will remove the manual change of IP and provide an easier way to manage access to Azure Cosmos DB endpoint.  More info

Azure Cognitive Search now in preview.  Cognitive Search, a new preview feature in the existing Azure Search service, includes an enrichment pipeline allowing customers to find rich structured information from documents.  That information can then become part of the Azure Search index.  Cognitive Search also integrates with Natural Language Processing capabilities and includes built-in enrichers called cognitive skills.  Built-in skills help to perform a variety of enrichment tasks, such as the extraction of entities from text or image analysis and OCR capabilities.  Cognitive Search is also extensible and can connect to your own custom-built skills.  More info

Azure SQL Database and Data Warehouse TDE with customer managed keys.  Now generally available, Azure SQL Database and Data Warehouse Transparent Data Encryption (TDE) offers Bring Your Own Key (BYOK) support with Azure Key Vault integration.  Azure Key Vault provides highly available and scalable secure storage for RSA cryptographic keys backed by FIPS 140-2 Level 2 validated Hardware Security Modules (HSMs).  Key Vault streamlines the key management process and enables customers to maintain full control of encryption keys and allows them to manage and audit key access.  This is one of the most frequently requested features by enterprise customers looking to protect sensitive data and meet regulatory or security compliance obligations.  More info

Azure Database Migration Service is now generally available.  This is a service that was designed to be a seamless, end-to-end solution for moving on-premises SQL Server, Oracle, and other relational databases to the cloud. The service will support migrations of homogeneous/heterogeneous source-target pairs, and the guided migration process will be easy to understand and implement.  More info

4 new features now available in Azure Stream Analytics: Public preview: Session window; Private preview: C# custom code support for Stream Analytics jobs on IoT Edge, Blob output partitioning by custom attribute, Updated Built-In ML models for Anomaly Detection.  More info

Posted in SQLServerPedia Syndication | 1 Comment

Azure SQL Data Warehouse Gen2 announced

Monday was announced the general availability of the Compute Optimized Gen2 tier of Azure SQL Data Warehouse.  With this performance optimized tier, Microsoft is dramatically accelerating query performance and concurrency.

The changes in Azure SQL DW Compute Optimized Gen2 tier are:

  • 5x query performance via a adaptive caching technology. which takes a blended approach of using remote storage in combination with a fast SSD cache layer (using NVMes) that places data next to compute based on user access patterns and frequency
  • Significant improvement in serving concurrent queries (32 to 128 queries/cluster)
  • Removes limits on columnar data volume to enable unlimited columnar data volume
  • 5 times higher computing power compared to the current generation by leveraging the latest hardware innovations that Azure offers via additional Service Level Objectives (DW7500c, DW10000c, DW15000c and DW30000c)
  • Added Transparent Data Encryption with customer-managed keys

Azure SQL DW Compute Optimized Gen2 tier will roll out to 20 regions initially, you can find the full list of regions available, with subsequent rollouts to all other Azure regions.  If you have a Gen1 data warehouse, take advantage of the latest generation of the service by upgrading.  If you are getting started, try Azure SQL DW Compute Optimized Gen2 tier today.

More info:

Turbocharge cloud analytics with Azure SQL Data Warehouse

Blazing fast data warehousing with Azure SQL Data Warehouse

Video Microsoft Mechanics video

Posted in Azure SQL DW, SQLServerPedia Syndication | 1 Comment

Podcast: Big Data Solutions in the Cloud

In this podcast I talk with Carlos Chacon of SQL Data Partners on big data solutions in the cloud.  Here is the description of the chat:

Big Data.  Do you have big data?  What does that even mean?  In this episode I explore some of the concepts of how organizations can manage their data and what questions you might need to ask before you implement the latest and greatest tool.  I am joined by James Serra, Microsoft Cloud Architect, to get his thoughts on implementing cloud solutions, where they can contribute, and why you might not be able to go all cloud.  I am interested to see if more traditional DBAs move toward architecture roles and help their organizations manage the various types of data.  What types of issues are giving you troubles as you adopt a more diverse data ecosystem?

I hope you give it a listen!

Posted in Podcast, SQLServerPedia Syndication | Comments Off on Podcast: Big Data Solutions in the Cloud

Cost savings of the cloud

I often hear people say moving to the cloud does not save money, but frequently they don’t take into account the savings for indirect costs that are hard to measure (or the benefits you get that are simply not cost-related).  For example, the cloud allows you to get started in building a solution in a matter of minutes while starting a solution on-prem can take weeks or even months.  How do you put a monetary figure on that?  Or these other benefits that are difficult to put a dollar figure on:

  • Unlimited storage
  • Grow hardware as demand is needed (unlimited elastic scale) and even pause (and not pay anything)
  • Upgrade hardware instantly compared to weeks/months to upgrade on-prem
  • Enhanced availability and reliability (i.e. data in Azure automatically has three copies). What does each hour of downtime cost your business?
  • Benefit of having separation of compute and storage so don’t need to upgrade one when you only need to upgrade the other
  • Pay for only what you need (Reduce hardware as demand lessons)
  • Not having to guess how much hardware you need and getting too much or too little
  • Getting hardware solely based on the max peak
  • Ability to fail fast (cancel a project and not have to hardware left over)
  • Really helpful for proof-of-concept (POC) or development projects with a known lifespan because you don’t have to re-purpose hardware afterwards
  • The value of being able to incorporate more data allowing more insights into your business
  • No commitment or long-term vendor lock
  • Benefit from changes in the technology impacting the latest storage solutions
  • More frequent updates to OS, sql server, etc
  • Automatic software updates
  • The cloud vendors have much higher security than anything on-prem.  You can imagine the loss of income if a vendor had a security breach, so the investment in keeping things secure is massive

As you can see, there is much more than just running numbers in an Excel spreadsheet to see how much money the cloud will save you.  But if you really needed that, Microsoft has a Total Cost of Ownership (TCO) Calculator that will estimate the cost savings you can realize by migrating your application workloads to Microsoft Azure.  You simply provide a brief description of your on-premises environment to get an instant report.

The benefits that are easier to put a dollar figure on:

  • Don’t need co-location space, so cost savings (space, power, networking, etc)
  • No need to manage the hardware infrastructure, reducing staff
  • No up-front hardware costs or costs for hardware refresh cycles every 3-5 years
  • High availability and disaster recovery done for you
  • Automatic geography redundancy
  • Having built-in tools (i.e. monitoring) so you don’t need to purchase 3rd-party software

Also, there are some constraints of on-premise data that go away when moving to the cloud:

  • Scale constrained to on-premise procurement
  • Yearly operating expense (OpEx) instead of CapEx up-front costs
  • A staff of employees or consultants administering and supporting the hardware and software in place
  • Expertise needed for tuning and deployment

I often tell clients that if you have your own on-premise data center, you are in the air conditioning business.  Wouldn’t you rather focus all your efforts on analyzing data?  You could also try to “save money” by doing your own accounting, but wouldn’t it make more sense to off-load that to an accounting company?  Why not also off-load the  costly, up-front investment of hardware, software, and other infrastructure, and the costs of maintaining, updating, and securing an on-premises system?

And when dealing with my favorite topic, data warehousing, a conventional on-premise data warehouse can cost millions of dollars in the following: licensing fees, hardware, and services; the time and expertise required to set up, manage, deploy, and tune the warehouse; and the costs to secure and back up the data.  All items that a cloud solution eliminates or greatly minimizes.

When estimating hardware costs for a data warehouse, consider the costs of servers, additional storage devices, firewalls, networking switches, data center space to house the hardware, a high-speed network (with redundancy) to access the data, and the power and redundant power supplies needed to keep the system up and running.  If your warehouse is mission critical then you need to also add the costs to configure a disaster recovery site, effectively doubling the cost.

When estimating software costs for a data warehouse, organizations frequently pay hundreds of thousands of dollars in software licensing fees for data warehouse software and add-on packages.  Also think about additional end users that are given access to the data warehouse, such as customers and suppliers, can significantly increase those costs.  Finally, add the ongoing cost for annual support contracts, which often comprise 20 percent of the original license cost.

Also note that an on-premises data warehouse needs specialized IT personnel to deploy and maintain the system.  This creates a potential bottleneck when issues arise and keeps responsibility for the system with the customer, not the vendor.

I’ll point out my two key favorite advantages of having a data warehousing solution in the cloud:

  • The complexities and cost of capacity planning and administration such as sizing, balancing, and tuning the system, are built into the system, automated, and covered by the cost of your subscription
  • By able to dynamically provision storage and compute resources on the fly to meet the demands of your changing workloads in peak and steady usage periods.  Capacity is whatever you need whenever you need it

Hopefully this blog post points out that while there can be considerable costs savings in moving to the cloud, there are so many other benefits that cost should not be the only reason to move.

More info:

How To Measure the ROI of Moving To the Cloud

Cloud migration – where are the savings?

Comparing cloud vs on-premise? Six hidden costs people always forget about

The high cost and risk of On-Premise vs. Cloud

TCO Analysis Demonstrates How Moving To The Cloud Can Save Your Company Money

5 Common Assumptions Comparing Cloud To On-Premises

5 Financial Benefits of Moving to the Cloud

IT Execs Say Cost Savings Make Cloud-Based Analytics ‘Inevitable’

Posted in Azure, SQLServerPedia Syndication | 3 Comments

Podcast: Myths of Modern Data Management

As part of the Secrets of Data Analytics Leaders by the Eckerson Group, I did a 30-minute podcast with Wayne Eckerson where I discussed myths of modern data management.  Some of the myths discussed include ‘all you need is a data lake’, ‘the data warehouse is dead’, ‘we don’t need OLAP cubes anymore’, ‘cloud is too expensive and latency is too slow’, ‘you should always use a NoSQL product over a RDBMS.’  I hope you check it out!

Posted in Data Lake, Data warehouse, Podcast, SQLServerPedia Syndication | 1 Comment

Webinar: Is the traditional data warehouse dead?

As a follow-up to my blog Is the traditional data warehouse dead?, I did a webinar on that very topic for the Agile Big Data Processing Summit.  The deck is here and the webinar is below:


Posted in Data warehouse, Podcast, Presentation, SQLServerPedia Syndication | 1 Comment

Is the traditional data warehouse dead? webinar

As a follow-up to my blog Is the traditional data warehouse dead?, I will be doing a webinar on that very topic tomorrow (March 27th) at 11am EST for the Agile Big Data Processing Summit that I hope you can join.  Details can be found here.  The abstract is:

Is the traditional data warehouse dead?

With new technologies such as Hive LLAP or Spark SQL, do you still need a data warehouse or can you just put everything in a data lake and report off of that? No! In the presentation, James will discuss why you still need a relational data warehouse and how to use a data lake and an RDBMS data warehouse to get the best of both worlds.

James will go into detail on the characteristics of a data lake and its benefits and why you still need data governance tasks in a data lake. He’ll also discuss using Hadoop as the data lake, data virtualization, and the need for OLAP in a big data solution, and he will put it all together by showing common big data architectures.

Posted in Data warehouse, Podcast, Presentation | 1 Comment