PolyBase explained

PolyBase is a new technology that integrates Microsoft’s MPP product, SQL Server Parallel Data Warehouse (PDW), with Hadoop.  It is designed to enable queries across relational data stored in PDW and in non-relational Hadoop data that is stored in the Hadoop Distributed File System (HDFS), bypassing Hadoop’s MapReduce distributed computing engine that is typically used to read data from HDFS.  You can create an external table in PDW that references Hadoop data (kinda like a linked server) and you can then query it with SQL, in essence adding structure to un-structured data.  So you can: 1) retrieve data from HDFS with a PDW query that will even allow that data to be joined to native PDW relational tables so that Hadoop and SQL PDW can be queried in tandem, with result sets that integrate data from each source (seamlessly joining structured and semi-structured data); 2) you can import data from HDFS to PDW; and 3) you can export data from PDW to HDFS (for example, as a backup strategy).

The biggest benefit with PolyBase is you don’t need to understand HDFS or MapReduce (typically written in Java) to access Hadoop, and there is no ETL needed.  And you can quickly and easily use a tool such as Power Pivot to connect to PDW and pull in data from PDW tables and external Hadoop tables.

Microsoft Technical Fellow David Dewitt is one of the principals behind PolyBase.  Some things to note:

  • When selecting data in Hadoop, the data is not stored in PDW – it uses a ShuffleMove/BroadcastMove/Round Robin to temporarily bring the data into PDW into temporary tables
  • PolyBase only works within PDW for now, but later it might be added to SQL Server (but there are no plans for that).  PolyBase relies on the Data Movement Service (DMS) in PDW, and DMS does not exist in SQL Server
  • It does not support DML operations
  • It may in the future be able to access other storage systems besides Hadoop
  • It only works for delimited text files
  • It requires Java RunTime environment (Oracle JRE)
  • It can connect to Hortonworks Data Platform (HDP) on Windows Server, HDP on Linux, Cloudera (CHD) on Linux
  • Soon PDW will have the ability to add a Hadoop scale-unit (compute nodes and storage) right into the PDW rack

In a future version of PolyBase the query optimizer will be able make a cost-based decision, when referencing data in an HDFS, to determine whether it should transform the query into a MapReduce job to be performed on the Hadoop cluster or if it should just process using the SQL server instances on the PDW.  Also, the optimizer will have the ability to move the workload of a query involving only PDW data to the Hadoop cluster.  This intelligence within the optimizer will allow it to split the workload between the two platforms and thus leverage the true capabilities of the Hadoop cluster.

polybase

So in summary, the main features of PolyBase are:

  • Simplicity: You can query data in Hadoop via regular SQL
  • Performance: Parallelized data reading and writing into Hadoop
  • Openness: Supports various Hadoop distributions
  • Integration: Works with Microsoft BI tools such as Power Pivot, Power View, SSRS, SSAS

Untitled picture

More info:

Seamless insights on structured and unstructured data with SQL Server 2012 Parallel Data Warehouse

Polybase: Hadoop Integration in SQL Server PDW V2

Microsoft’s PolyBase mashes up SQL Server and Hadoop

Insight through Integration: SQL Server 2012 Parallel Data Warehouse

PASS talk: Polybase: What, Why, How

Posted in Hadoop, PDW, SQLServerPedia Syndication | 7 Comments

Why I just became a Microsoft Employee

I have been an independent consultant (IC) for quite a while now.  In an amazing number of coincidences, or just plain fate, in a matter of 17 days I went from hearing about a job opening at Microsoft to accepting their offer.  It all started when a friend of mine at Microsoft called and said “We have a job opening at Microsoft I think you would be good for, but it would require a move to NYC.  I know it’s unlikely but any chance you would consider?”.

Well, it turns out my wife and I have been thinking of moving back east for a while, as our youngest child will be graduating high school in a few months, leaving us free to move anywhere.  Since I was born in NY, have two sisters who live in NYC, have many relatives there, and had a desire for a place that had a true change of seasons, NY made a lot of sense.  Then add the fact that during my talks with Microsoft about the job, my son was accepted to SUNY New Paltz where he will play college soccer and study computer science (New Paltz is about an hour-and-a-half bus ride to Manhattan), which made it even more desirable for us to move to NY.

The job is for a PDW TSP for the North East region.  Microsoft has lots of TLA (Three Letter Acronyms): PDW stands for Parallel Data Warehouse, and TSP stands for Technology Solution Professional or just Technology Specialist.  Basically, the job entails presenting, demoing, and educating companies about PDW and its benefits, and making sure it is a good fit for the client.  Further along will be architecting, designing and modeling, and doing POC’s (proof-of-concept) which will involve working with a PDW Center of Excellence (CoE) Architect.  I will work closely with a Solution Sales Professional (SSP), also called a Solution Specialist, who finds opportunities with customers.  A TSP is about 75% technical and 25% sales.  Although it’s for the North East region, most of my time will be spent in NYC, with a few trips outside of NY to places like Boston.

Once I heard about the job I started writing a list of the pros and cons of taking the job:

Pros:

  1. Work for Microsoft.  I have wanted to work for Microsoft since I was 17 years old and right out of high school.  Almost 30 years later it finally happened :-)
  2. Work in NYC.  NYC is a great city and I have visited many times.  I have always wanted to work there and my office will be in a great location: at the Microsoft Technology Center (MTC) at 6th avenue and 52nd street, right next to the Radio City Music Hall
  3. Work on PDW.  I worked on PDW 1.0, and have been anxious to work on version 2.0
  4. Work for a company with career paths.  With most companies I had worked for, I had no opportunity for advancement unless I wanted a total management role.  There were usually no lateral moves either.  With Microsoft, there are endless opportunities that will allow me to stay technical if I wish
  5. Work with smart people.  There are lots of really sharp people working for Microsoft that I can learn from.  With a few exceptions, most places I have been at I was the “BI guy” and no one else had much knowledge of the subject
  6. Don’t work on project-based stuff.  As I have gotten older, I have been less-and-less interested in the daily minutia of doing project work (and the stress/worry about a project being “successful”).  With this new role, it will be very short engagements with many clients.  Another TSP called it sort of like “speed dating”
  7. Do presentations and engage with customers.  I will have lots of different experiences while I do a lot of presentations and demo’s and talk with clients.  That is what I love doing
  8. Meet lots of clients and potential customers.  I really enjoy meeting new people and new environments.  With this position I will be at 2-3 new companies a week
  9. Great benefits.  Microsoft is constantly ranked #1 in the USA for benefits.  The one big benefit is nearly free health care coverage.  The new health care law has tripled my cost
  10. Monetary incentive to go above-and-behind job requirements.  Part of my bonus is based on quota targets.  While there is a risk I won’t hit the target and the bonus could be less or even zero, I look at it as if I work hard and put in the extra effort I will exceed the quota and be rewarded
  11. Pay for move.  Moving from Houston to New York can be quite expensive, not to mention not having to do the packing
  12. Close to two kids in college.  My son will be about an hour-and-a-half from the city and another daughter is in Charleston, SC.  My 3rd child will also likely move to the east coast
  13. Make lots of contacts.  Due to all the companies I will present at my Rolodex will get quite big!
  14. No independent consulting hassles: worrying about the next contract (job stability), invoices, filing taxes, late payments from clients, travel reimbursement, time dedicated to speaking with recruiters and interviewing, lower rates and difficulty finding work during a down-turn in the economy, etc.  I did not mind these things that much, but they did take up a lot of time and can get old after a while
  15. Get paid to go to conferences/blog/learn/training/research.  As an IC if I’m not working I’m not getting paid.  So going to major conferences means I will have 3-4 weeks per year of not being paid, plus I have to pay for the conference and all the expenses.  That really adds up.  In my Microsoft role going to conferences is part of the job.  Also, I spend a lot of time off-the-clock learning and researching new technology.  While I will still do a lot of that, some will be done as part of my Microsoft position
  16. Have a mentor.  Every TSP gets 1-2 mentor’s within Microsoft who will help them with their career goals and how to achieve them
  17. Paid vacation/sick/holidays.  As I mentioned, as an IC if I don’t work I don’t get paid.  It will be nice to have a paid vacation and holidays
  18. Work from home on occasion.  I don’t have to be in the office every day, so I can work from home to prepare presentations and demo’s, among other things.  But I love NYC so will be there as much as possible
  19. Tuition reimbursement if I want to go for MBA.  I can’t see myself every going back to college, but it’s nice to have that option
  20. Step out of my comfort zone.  Part of my job is sales, something I don’t have a lot of experience with.  I am looking forward to the challenge and enjoy learning and hopefully excelling at something new to me
  21. Tons of resources.  I will have all of the Microsoft employees as resources if I have any questions, need someone to bounce ideas off of, need help solving a problem, want someone to review my architect solution, etc.
  22. Insider product knowledge.  I get a lot of insider stuff as an MVP, but I might hear and see more of that as a Microsoft employee
  23. No hourly billing and tracking (labor logging).  It’s not so much of a pain entering the logging, but more having to track my working hours.  If I spend two hours at the dentist one day, I’m only billing 6 hours that day.  If the client tells me I need to wait a week before I get my next task, that is a week I don’t get paid
  24. Flexible work schedule. I won’t have set hours.  I will have tasks to finish and clients to visit, but it does not matter when I put the hours in, just that I get the tasks done in time.  I sometimes get my best work done really late at night :-)
  25. Supportive management.  All the managers I have met seem very willing to make sure I have everything I need and that I remain happy at Microsoft
  26. Getting a behind-the-scenes look at how a big and successful technology company works
  27. The ability to further ones own knowledge via learning opportunities, such as TechReady, which is a semi-annual internal technical conference for Microsoft employees.  There are also elective web and class room training such as negotiation, presentation, business and technical skills

Cons:

  1. Corporate politics.  Every company has it, but I’m glad they got rid of the stack ranking system
  2. With a TSP, it involves sales, and I am new to sales (but this could be positive due to the challenge)
  3. I will lose my SQL Server MVP status.  I have only been an MVP four months.  If you are a Microsoft employee you are not allowed to be a MVP.  But I look at the positive side: at least I became an MVP beforehand
  4. No OT pay.  Not that I worked much OT, but now it won’t be paid.  But with Microsoft, putting in the extra hours can pay off in other ways
  5. I will need to work to stay on top of my technical skills.  This is because I never get to do implementations (other than maybe help a bit with a POC).  However, I currently spend a good deal of time at night learning new technology anyway

As you can see the pros far outweighed the cons, so the decision became easy, especially with our desire to move to NY.

In the end I am extremely excited about the position and looking forward to getting started on Feb 18th.  I will be traveling every other week to NYC until we move their permanently around June.  I will continue my normal blogging of two posts a week, and will continue to attend the major conferences (hopefully attending even more conferences than normal).

And if your company if interested in finding out more about PDW, email me and I’d be happy to do a presentation for you!

Posted in Career, PDW, SQLServerPedia Syndication | 21 Comments

Power BI for Office 365 now available!

Microsoft has announced today the general availability of Power BI for Office 365!

Microsoft describes it as: Power BI for Office 365 is a complete self-service Business Intelligence (BI) solution delivered through Excel and Office 365 providing you with data discovery, analysis, and visualization capabilities to identify deeper business insights from your data. The Power BI for Office 365 service is a cloud-based solution that reduces the barriers to deploying a business intelligence environment for sharing reports and accessing information.

Read more about it on my blog: Power BI first impressions and Power BI for Office 365 video and Power BI for Office 365 FAQ.

More info:

Microsoft Announces Public Availability Of Power BI For Office 365 Enterprise Customers

Microsoft Releases Power BI for Office 365

Announcing the General Availability of Power BI for Office 365

Power BI for Office 365 now available to do more with business insights in Excel

What Types of Projects to Consider for Microsoft Power BI

Power BI – Overview of Features End-to-End

Posted in Power BI, SQLServerPedia Syndication | Leave a comment

24 Hours of PASS: Business Analytics Edition videos online

On February 5, 2014 business intelligence experts took to the virtual stage for the 24 Hours of PASS: Business Analytics Edition to deliver a series of one-hour webcasts, digging deep into the world of business analytics.  All session recordings and slide presentations are now available for streaming from the PASS Business Analytics Conference website.  Enjoy!

Posted in PASS, SQL Server, SQLServerPedia Syndication, Videos | Leave a comment

Forcing a contractor to go W2

I have had a few questions like this: “I was wondering if there is a reason the vendor/broker/head hunter forces a contractor to go on W2?  As per many vendors, lots of big banks are insisting that contractors should be on vendors W2 and NOT have a sub-contracting relationship through an s-corp or 1099.”  To see the difference between W2 and 1099, check out Consultants: 1099 or W-2?.

So I asked my recruiter friend, and his reply:

I just landed a client a few months ago and per their MSA (Master Service Agreement), any candidate that I put on contract with them MUST be a W2’d employee of my staffing company.  It basically comes down to showing that the vendor (my staffing company) has control over the employee and that there aren’t a bunch of layers between the vendor and other sub-contractors.  By eliminating the potential for multiple sub-contracting layers it does two things: (1) reduces the margin between what the contractor “actually” costs and what that person is being billed to the client and (2) it shows that the vendor has control over the contractor (meaning that the vendor has a direct relationship with the candidate and the vendor isn’t getting the person from a sub of a sub of a sub).  I’m sure relative to the “banking” scenario listed in your question, there are also some federal regulations that have to be followed since dollars are involved in that industry.  It mostly comes down to ownership/accountability.  How well do you know a candidate if they come from a sub of a sub of a sub?  It is better for everyone involved there is a “direct” relationship between the vendor and their employee.

Posted in Consulting, SQLServerPedia Syndication | 1 Comment

SQL Server Data Tools (SSDT) – January 2014 update

The SSDT January 2014 release has been updated to support both SQL Server 2012 Parallel Data Warehouse Appliance Update 1 (AU1) and continues to support SQL Server 2012 Parallel Data Warehouse.  This release of SSDT is now PDW version-aware, and the experience is different when connected to SQL Server 2012 Parallel Data Warehouse versus SQL Server 2012 Parallel Data Warehouse AU1.

SSDT version history:

VS 2012: January 2014 version: 11.1.31203.1, October 2013 version: 11.1.31009.1, August 2013 version: 11.1.30822.0, June 2013 version: 11.1.30618.1, December 2012 version: 11.1.21208.0, November 2012 version: 11.1.21101.1, September 2012 version: 11.1.20905.0, version shipped with VS 2012: 11.1.20627, initial version: 11.1.20225.0

VS 2013: CTP2 version: 11.1.31024.0

VS 2010: No longer being updated*, October 2013 version: 10.3.31009.1, August 2013 version: 10.3.30822.0, June 2013 version: 10.3.30618.1, December 2012 version: 10.3.21208.0, November 2012 version: 10.3.21101.1, September 2012 version: 10.3.20905.0, initial version: 10.3.20225.0

* From Microsoft: We will no longer update SSDT for Visual Studio 2010.  Projects and DACPACs are fully compatible across shells.  Please download the toolset for VS2012 or VS2013 using the links above for continued updates.

More info:

Updated SQL Server Data Tools January 2014

Posted in SQLServerPedia Syndication, SSDT/Juneau | Leave a comment

Power BI for Office 365 pricing announced

Pricing for Power BI for Office 365 has been announced (see pricing).

mspowerbipricing-620x409

 

It’s all very confusing to me but Melissa Coates does a great job explaining it (see Initial Pricing for Power BI Has Been Announced).

More info:

Microsoft pins a price tag on its PowerBI business-intelligence tools

Power BI Pricing Announced

 

Posted in Power BI, SQLServerPedia Syndication | Leave a comment

PASS Business Analytics Conference discount code

I have previously blogged that I am presenting at the PASS Business Analytics Conference.  If you are looking to learn more about business intelligence and data warehousing, there is no better conference out there.  And if you are looking to meet people like you in the industry and share ideas, this is the conference to go to.  And one more reason, I have a $150 discount code for you: BASD9A

Keep in mind that the BA Conference registration rate goes up after this Saturday, January 18, so register now.

End of hard sell :-)

PASS_BA_Conference

Posted in PASS, SQLServerPedia Syndication | 2 Comments

Independent consultant: How to find work

When I talk to people about switching from being a salaried employee to an independent consultant (see my presentation How to make a LOT more money as a consultant), the most frequent question I hear is “I worry about having to drum up my own business.  I never had to do that.  How did you make the transition?”.

First let me say, it’s a lot easier than you think, especially if your background is in data warehousing and business intelligence, which is a very hot skill-set.  I talk to recruiters all the time who mention how hard it is to find people with both the technical skills and the people skills.  If you fall into this category there is a ton of opportunities out there.

There are two main ways of landing work: through your own direct contacts, or via a recruiter/placement company.  Getting work directly is usually best, as you cut out the middle-man and therefore make more money.  But it’s hard to find work directly as it’s all based on who you know.  Plus, a lot of companies use preferred vendors so you are forced to go through one of them, even if you have a direct contact at the company.

Using recruiters is a great way to find work, and that is the way I have found most of my consulting projects.  Recruiters usually find me either through LinkedIn (see How to use LinkedIn to enhance your career), my blog (see Enhance your career by blogging!), job boards, or from their own local candidate list based on a previous contact with them.  It’s important to talk with recruiters even when you are not looking for work (just avoid the resume hoarders and those looking for you to help them do their job).  If you talk to the recruiters and build a relationship, they will keep you in mind for future opening that come across their desk.  Most of the time these projects are 6-12 month contracts.  It will be difficult to find contracts if you live in a small town and can’t travel.  But if you live near a big city and can travel, there are tons of opportunities.

It has been years since I went a day without work in-between projects.  When a project ends, I have always started another project the next day.  And if you are making twice as much income as your salaried position, even having a few weeks off in-between projects will be more than worth it.

The best way to get direct work is to use those relationships you have built on prior projects.  Contact companies you have worked for in the past as a contractor or perm to see if they could use you.  Even the current company that you are working for now as a salaried person may be very willing to hire you as an independent consultant for a future project.  Like I said, it’s hard to find people with both the technical and people skills.  And as you build up your personal “brand” via blogging, presenting, writing articles and books, etc, you will have more companies contract you directly.

Bottom line, if you have good work experience, are a quick learner, have good people skills, and are always trying to build relationships with recruiters, you will always have work.  You just need to get over your fear of “going on your own”.  Once you do, you will wish you became an independent consultant sooner!

If you are ready to make the jump, check out some of my other blog posts: Blueprint for consulting richesSalaried employee vs contractorConsultants: 1099 or W-2?Thinking about taking a contract position? Questions to askHow to become an expert in your fieldABL: Always Be Learning,

Posted in Consulting, SQLServerPedia Syndication | Leave a comment

Business Intelligence/Data Warehouse Assessment

When I am tasked to do a business intelligence or data warehouse assessment, the steps I take to do that depend on the amount of time and the number of people I have.  The result of the assessment will be a plan to build a new data warehouse or business intelligence solution along with a proposed architecture for the new solution (i.e. a “Data Warehouse Architecture Blueprint”).  And obviously the bigger solution the longer time that should be spent on the assessment.

Here are those steps based on a quick, medium, or long approach:

Quick approach (2-4 weeks, 1 person):

Interview business units, find out gaps and needs and pain points, diagram existing environment, create assessment document, create solution document of 7-10 pages (with goals and technologies to prove out), create proposed architecture presentation.

Medium approach (4-6 weeks, 2 people):

The solution document will contain more details when using the medium approach.

The solution document could be stated as a data warehouse architecture blueprint (30-40 pages) that outlines the enterprise information architecture, the data warehouse solution, and the supporting infrastructure and environment needs for the Data Warehouse solution.  It’s TOC may look like:

  1. Executive Summary
  2. Data Warehouse Architecture Vision
    1. Current Environment
    2. Data Warehouse Vision
  3. Data and Integration Architecture
    1. Data Extraction and Staging
    2. Master and Operational Data
    3. Data Warehouse Data
    4. Business Area Cubes
    5. ETL, Integration, and Auditing
  4. BI Architecture
    1. Overall Architecture
    2. BI Governance and Compliance
      1. Governance Risks
      2. BI Governance Plan
      3. SharePoint BI Center Design
  5. Security and Infrastructure Architecture
    1. Security Architecture
    2. Infrastructure

Also useful is to create is a data warehouse roadmap (30-40 pages) that identifies the timeline, resources and approach to implement a corporate data warehouse.  It’s TOC may look like:

  1. Executive Summary
  2. Data Warehouse Roadmap Vision
    1. Corporate Support Objectives
      1. Success Factors
      2. Current Architecture and Drawbacks
      3. Architectural Vision
    2. Business Value
    3. Project Planning
      1. Architecture Foundation (Platform and Infrastructure)
      2. Initial EDW and BI Iteration Planning
      3. Master Data Management
  3. Resource planning
    1. Team Roles and Human Resources
      1. Team Roles
      2. Human Resources
      3. Team Skills Development
    2. Services, Software, and Systems
      1. Services
      2. Software
        1. Platform Toolset
        2. Client Software
        3. Developer and Management Tools
      3. Production Hardware Environment

Another layout that combines the above two documents could look like this:

  • Project description: Overview, scope, high-level requirements
  • Current technical architecture: Overview, logical architecture, system inventory, data architecture
  • Current capabilities: Overview, data management, analysis, information delivery
  • Target architecture and components: Overview, target data architecture diagram, technical architecture key decisions, data management, data integration, MDM, OLAP, presentation, metadata management, people, processes, roles and responsibilities, tools and technologies, data governance, target capabilities, target constraints, gap analysis
  • Implementation approach: Overview, key decisions, description of phases, roadmap

Long approach (10-25 weeks, 3 people):

Assess your current environment, identify current and future requirements, prioritize requirements using a business valuation framework and thereby develop a roadmap.  The method is inherently iterative but will be completed in 4 phases:

Phase 1 - Assessment and  Discovery

  1. Situational Assessment: History of BI and DW, priorities for future growth
  2. Architecture Discovery and Assessment: Gain a broad and rich understanding of the current architecture from multiple perspectives, including principles, systems, data, tools, technology, and infrastructure
  3. Organizational Assessment: Roles and responsibilities. evaluate skillsets of team members and identify gaps in skills and proficiency
  4. Data Quality Assessment: Profile data
  5. Process Assessment: Current methodologies used within organization around all aspects of delivering analytical solutions and their conformance to best practices
  6. Governance Assessment: Review the data governance practices

Phase 2 - Opportunity Assessment  and Architecture Definition

  1. Opportunity Identification: Reveal high level expectations for the future of business intelligence
  2. Needs Definition: Document and rank opportunities

Phase 3 - Future State Architecture Definition

  1. Architecture Workshops: Builds on the Architecture Discovery sessions to obtain and define more detail regarding the desired system, data, and technical architecture revisions required in the migration plan.  Drive the pilot exercises by producing requirements for technology and functionality (the pilot exercise is leveraged to continually inform and refine the architecture recommendations). The results of each Pilot exercise will then inform any updates to the architecture and generate requirements for the next pilot exercise
  2. Optimal Component Scheme: Classes of components that will be required to implement the architecture will be specified along with their functionality

Phase 4 – Targeting and Migration Plan

  1. Opportunity Targeting: Synthesizes the results of Needs Definition, and the Architecture Workshops to define incremental work effort that can be bundled into discrete delivery projects for the roll out
  2. Incremental Investment Plan: Combines the proposed project specifications and the architecture specifications to create an overall plan for a minimum of 12 – 18 months
Posted in Business Intelligence, Data warehouse, SQLServerPedia Syndication | 2 Comments