Presentation slides for Houston SQLSaturday

Thanks to everyone who attended my two sessions at SQLSaturday in Houston (more info).  Here are the abstracts and PowerPoint links:

Best Practices to Deliver BI Solutions

If your company is planning to build a data warehouse or BI solution, you need to be aware that BI projects have high failure rates. Gartner says between 70% to 80% of corporate business intelligence projects fail. And with “big data” adding more complexity you can expect even more failures. However, the major causes of these failures are well known and can be avoided by implementing a set of best practices.

I have worked on dozens of end-to-end BI projects and have seen my share of successes and failures. I will talk about the reasons BI projects fail and share best practices and lessons learned so your BI project will fall into the “successful” category.

Here is the PowerPoint presentation: Best Practices to Deliver BI Solutions

Enhancing your career: Building your personal brand

In three years I went from a complete unknown to a popular blogger, speaker at PASS Summit, and a SQL Server MVP.  Along the way I saw my yearly income triple.  Is it because I know some secret?  Is it because I am a genius?  No!  It is just about laying out your career path, setting goals, and doing the work.  It’s about building your personal brand and stepping out of your comfort zone.  It’s about overcoming your fear of taking risks.  If you can do those things, you will be rewarded.  I will discuss how you too can go from unknown to well-known.  I will talk about building your community presence by blogging, presenting, writing articles and books, twitter, LinkedIn, certifications, interviewing, networking, and consulting and contracting.  Your first step to enhancing your career will be to attend this session!

Here is the PowerPoint presentation: Enhancing your career: Building your personal brand

Posted in Business Intelligence, Career, Presentation, Session, SQLServerPedia Syndication | 1 Comment

Presentation Slides for Building an Effective Data Warehouse Architecture

Thanks to everyone who attended my session “Building an Effective Data Warehouse Architecture” at the PASS Business Analytics conference.  The abstract is below.  I hope you enjoyed it!

Here is the PowerPoint presentation: Building an Effective Data Warehouse Architecture

Building an Effective Data Warehouse Architecture

You’re a DBA and your boss asks you to determine if a data warehouse would help the company.  So many questions pop into your head: Why use a data warehouse?  What is the best methodology to use when creating a data warehouse?  Should I use a normalized or dimensional approach?  What is the difference between the Kimball and Inmon methodologies?  Does the new Tabular model in SQL Server 2012 change things?  What is the difference between a data warehouse and a data mart?  Is there any hardware I can purchase that is optimized for a data warehouse?  What if I have a ton of data? During this session James will help you to answer these questions so your response to your boss will provoke amazement and lead to a big raise.  Or at least help to lead you down the correct path!

Posted in Data warehouse, Presentation, Session, SQLServerPedia Syndication | Leave a comment

SQLSaturday in Houston

This upcoming Saturday I will be presenting two sessions at SQLSaturday in Houston (more info).  I hope to see you there.  Here are the abstracts:

Best Practices to Deliver BI Solutions

If your company is planning to build a data warehouse or BI solution, you need to be aware that BI projects have high failure rates. Gartner says between 70% to 80% of corporate business intelligence projects fail. And with “big data” adding more complexity you can expect even more failures. However, the major causes of these failures are well known and can be avoided by implementing a set of best practices.

I have worked on dozens of end-to-end BI projects and have seen my share of successes and failures. I will talk about the reasons BI projects fail and share best practices and lessons learned so your BI project will fall into the “successful” category.

Enhancing your career: Building your personal brand

In three years I went from a complete unknown to a popular blogger, speaker at PASS Summit, and a SQL Server MVP.  Along the way I saw my yearly income triple.  Is it because I know some secret?  Is it because I am a genius?  No!  It is just about laying out your career path, setting goals, and doing the work.  It’s about building your personal brand and stepping out of your comfort zone.  It’s about overcoming your fear of taking risks.  If you can do those things, you will be rewarded.  I will discuss how you too can go from unknown to well-known.  I will talk about building your community presence by blogging, presenting, writing articles and books, twitter, LinkedIn, certifications, interviewing, networking, and consulting and contracting.  Your first step to enhancing your career will be to attend this session!

Posted in Presentation, Session, SQLServerPedia Syndication | Leave a comment

Hadoop and Data Warehouses

I see a lot of confusion when it comes to Hadoop and its role in a data warehouse solution.  Hadoop should not be a replacement for a data warehouse, but rather should augment/complement a data warehouse.  Hadoop and a data warehouse will often work together in a single information supply chain: Hadoop excels in handling raw, unstructured and complex data with vast programming flexibility; Data warehouses, on the other hand, manage structured data, integrating subject areas and providing interactive performance through BI tools.  As an example in manufacturing, let’s look at using Hadoop and a data warehouse on a Parallel Data Warehouse (PDW):

Untitled picture

There are three main use cases for Hadoop with a data warehouse, with the above picture an example of use case 3:

  1. Archiving data warehouse data to Hadoop (move)
    Hadoop as cold storage/Long Term Raw Data Archiving:
    – So don’t need to buy bigger PDW or SAN or tape
  2. Exporting relational data to Hadoop (copy)
    Hadoop as backup/DR, analysis, cloud use:
    – Export conformed dimensions to compare incoming raw data with what is already in PDW
    – Can use dimensions against older fact table
    – Sending validated relational data to Hadoop
    – Hadoop data to WASB and have that used by other tools/products (i.e. Cloud ML Studio)
    – Incremental Hadoop load / report
  3. Importing Hadoop data into data warehouse (copy)
    Hadoop as staging area:
    – Great for real-time data, social networks, sensor data, log data, automated data, RFID data (ambient data)
    – Where you can capture the data and only pass the relevant data to PDW
    – Can do processing of the data as it sits in Hadoop (clean it, aggregate it, transform it)
    – Some processing is better done on Hadoop instead of SSIS
    – Way to keep staging data
    – Long-term raw data archiving on cheap storage that is online all the time (instead of tape) – great if need to keep the data for legal reasons
    – Others can do analysis on it and later pull it into data warehouse if find something useful

Note there will still be a need for some Hadoop skills: Loading data into Hadoop (unless all done thru Polybase), maintenance, cleaning up files, managing data, etc.  But people who need access to data don’t need any Hadoop skills or special connections – they can use all skills they have today.

Here are some of the reasons why it is not a good idea to have only Hadoop as your data warehouse:

  • Hadoop is slow for reading queries.  HDP 2.0 today will not perform anywhere near PDW for interactive querying.  This is why PolyBase is so important, as it bridges the gap between the two technologies so customers can take advantage of both the unique features of Hadoop and realize the benefits of a EDW.  Truth be told users won’t want to wait 20+ seconds for a MapReduce job to start up to execute a Hive query
  • Hadoop is not relational, as all the data is in files in HDFS, so there always is a conversion process to convert the data to a relational format
  • Hadoop is not a database management system.  It does not have functionality such as update/delete of data, referential integrity, statistics, ACID compliance, data security, and the plethora of tools and facilities needed to govern corporate data assets
  • Restricted SQL support, such as certain aggregate functions missing
  • There is no metadata stored in HDFS, so another tool needs to be used to store that, adding complexity and slowing performance
  • Finding expertise in Hadoop is very difficult: The small number of people who understand Hadoop and all its various versions and products versus the large number of people who know SQL
  • Super complex, lot’s of integration with multiple technologies to make it work
  • Many tools/technologies/versions/vendors, no standards
  • Some reporting tools don’t work against Hadoop, as well as OLAP
  • The new Hadoop solutions (Tez, X, Spark, etc) are still figuring themselves out.  Customers should not take the risk of investing in one of these solutions (like MapReduce) that may be obsolete
  • It might not save you much in costs: you still have to purchase hardware, support, licenses, training, migration costs
  • If you need to combine relational data with Hadoop, you will need to move that relational data to Hadoop since there is no PolyBase-like technology

I also wanted to mention that “unstructured” data is a bit of a misnomer.  Just about all data has at least some structure.  Better to call it “semi-structured”.  I like to think of it as data in a text file is semi-structured until someone adds structure to it, by doing something like importing it into a SQL Server table.  Or think of structured data as relational and unstructured as non-relational.

More info:

Using Hadoop to Augment Data Warehouses with Big Data Capabilities

WEBINAR Replay: Hadoop and the Data Warehouse: When to Use Which

Hadoop and the Data Warehouse: When to Use Which

Video Big Data Use Case #5 – Data Warehouse Augmentation

How Hadoop works with a Data Warehouse

Design Tip #165 The Hyper-Granular Active Archive

No, Hadoop Isn’t Going To Replace Your Data Warehouse

Posted in Data warehouse, Hadoop, PDW/APS, SQLServerPedia Syndication | 2 Comments

Presentation Slides for Modern Data Warehousing

Thanks to everyone who attended my session “Modern Data Warehousing” at the PASS SQLSaturday Business Analytics edition in Dallas.  The abstract is below.  Great turnout for the last session of the day!

Here is the PowerPoint presentation: Modern Data Warehousing

Modern Data Warehousing

The traditional data warehouse has served us well for many years, but new trends are causing it to break in four different ways: data growth, fast query expectations from users, non-relational/unstructured data, and cloud-born data.  How can you prevent this from happening?  Enter the modern data warehouse, which is able to handle and excel with these new trends.  It handles all types of data (Hadoop), provides a way to easily interface with all these types of data (PolyBase), and can handle “big data” and provide fast queries.  Is there one appliance that can support this modern data warehouse?  Yes!  It is the Parallel Data Warehouse (PDW) from Microsoft, which is a Massively Parallel Processing (MPP) appliance that has been recently updated (v2 AU1).  In this session I will dig into the details of the modern data warehouse and PDW.  I will give an overview of the PDW hardware and software architecture, identify what makes PDW different, and demonstrate the increased performance.  In addition I will discuss how Hadoop, HDInsight, and PolyBase fit into this new modern data warehouse.

Posted in Data warehouse, Presentation, Session, SQLServerPedia Syndication | Leave a comment

Modern Data Warehousing Presentation

I will be presenting the session “Modern Data Warehousing” on Saturday at the PASS SQLSaturday Business Analytics edition in Dallas at 4:30pm CST (info).  The abstract for my session is below.  I hope you can make it!

Modern Data Warehousing

The traditional data warehouse has served us well for many years, but new trends are causing it to break in four different ways: data growth, fast query expectations from users, non-relational/unstructured data, and cloud-born data.  How can you prevent this from happening?  Enter the modern data warehouse, which is able to handle and excel with these new trends.  It handles all types of data (Hadoop), provides a way to easily interface with all these types of data (PolyBase), and can handle “big data” and provide fast queries.  Is there one appliance that can support this modern data warehouse?  Yes!  It is the Parallel Data Warehouse (PDW) from Microsoft, which is a Massively Parallel Processing (MPP) appliance that has been recently updated (v2 AU1).  In this session I will dig into the details of the modern data warehouse and PDW.  I will give an overview of the PDW hardware and software architecture, identify what makes PDW different, and demonstrate the increased performance.  In addition I will discuss how Hadoop, HDInsight, and PolyBase fit into this new modern data warehouse.

Posted in Data warehouse, Presentation, Session, SQLServerPedia Syndication | Leave a comment

What is the Microsoft Analytics Platform System (APS)?

Analytics Platform System (APS) is simply a renaming of the Parallel Data Warehouse (PDW).  It is not really a new product, but rather a name change due to a new feature in Appliance Update 1 (AU1) of PDW.  That new feature is the ability to have a HDInsight region (a Hadoop cluster) inside the appliance.

So APS combines SQL Server and Hadoop into a single offering that Microsoft is touting as providing “big data in a box.”

Think of APS as the “evolution” of Microsoft’s current SQL Server Parallel Data Warehouse product.  Using PolyBase, it now supports the ability to query data using SQL across the traditional data warehouse, plus data stored in a Hadoop region, whether in the appliance or a separate Hadoop Cluster.

More info:

The data platform for a new era

Posted in PDW/APS, SQLServerPedia Syndication | 1 Comment

Parallel Data Warehouse (PDW) AU1 released

Microsoft’s PDW has seen a big boost in visibility and sales over the past year, and part of the reason is due to frequent upgrades to the hardware and software.  About every six months there is an appliance update, which is sort of like a service pack, except that in addition to new features it usually also includes upgrades in hardware (since the PDW is a hardware and software appliance).  Just released is Appliance Update 1 (AU1) to version 2 of PDW.  Details below, including improvements with HDInsight, Hadoop, and Polybase,

First on-premises HDInsight region

  • Enables customers to load, query and analyze structured and unstructured data within a single appliance.
  • T-SQL compatibility over Hadoop via Polybase.
  • Windows failover clustering and full hardware redundancy.
  • Unified management and monitoring experience across HDInsight and PDW.

Polybase Enhancements

  • Better Polybase performance
    • Compute push-down enables 2x improvement.
    • Ability to create statistics over external tables via fullscan or sampling.
  • Polybase T-SQL semantics – Close alignment with T-SQL semantics for data types & conversions.
  • RCFile format support.
  • Compressed and uncompressed Hadoop data support.
  • Better manageability and user experience with new catalog views, DMVs, EXPLAIN, and SSDT support.

Hybrid Cloud Support

  • Import and export to Azure Blob storage – WASB/ASV.
  • Seamless query across WASB and HDInsight region.

PDW Performance Improvements

  • Parallel data load enables 2-8x better load performance.
  • Native Type Conversion for Loader improves load time of single load by 75%, with multiple concurrent loads achieving 2x improvements, validated up to 9.4TB/Hr.

Integrated Authentication (PDW)

  • Enables single sign-on for simplified user management and easier authentication.
  • Support for Kerberos and NTLM.
  • Windows authentication supported through SqlClient, ODBC, and OLE-DB.

Transparent Data Encryption (PDW):

  • Protects data at rest by encrypting data pages on disk, transactional logs, and database backups.

Add Capacity (PDW):

  • Enables customers to expand to any supported topology.
  • All operations except data redistribution are online operations, reducing downtime.

Add Region (HDInsight):

  • Enables customers to add HDI region to existing PDW appliance.

Better upgrade experience:

  • Upgrades can now be run remotely.
  • Reduces time to upgrade appliances by simplifying and minimizing manual steps.
Posted in PDW/APS, SQLServerPedia Syndication | 2 Comments

PASS Summit 2012 session videos online

All PASS Summit 2012 recordings are now available for free to all PASS members (201 of them!).  To view, log on to your myPASS account, and watch the sessions from myRecordings.  While there, check out mine: Building an Effective Data Warehouse Architecture

Posted in PASS, SQLServerPedia Syndication, Videos | Leave a comment

Book: Reporting with Microsoft SQL Server 2012

I am happy to say I have published my first book!  It is called Reporting with Microsoft SQL Server 2012 (order).  Much thanks goes to my co-author and friend Bill Anton.  Below is a brief overview or check out the listing on the Packt Publishing site for a sample chapter and the table of contents.

MyBook

Reporting with Microsoft SQL Server 2012 will cover all the features of SSRS and Power View and will provide a step-by-step lab activity to help you develop reports very quickly.

Starting with the difference between standard and self-service reporting, this book covers the main features and functionality offered in SQL Server Reporting Services 2012 including a breakdown of the report components, development experience, extensibility, and security. You will also learn to set up and use Power View within Excel and SharePoint and connect to a tabular model as well as a multidimensional model. The book provides real-life reporting scenarios that help to clarify when those scenarios are discussing standard reporting, in which case SSRS is the best choice, and when they are discussing self-service reporting, in which case Power View is the best choice.

This book will have you creating reports in SSRS and Power View in no time!

Posted in Power View/Project Crescent, SQLServerPedia Syndication, SSRS | Leave a comment