Category Archives: Hadoop

Microsoft Products vs Hadoop/OSS Products

Microsoft’s end goal is for Azure to become the best cloud platform for customers to run their data workloads.  This means Microsoft will provide customers the best environment to run their big data/Hadoop as well as a place where Microsoft … Continue reading

Posted in Hadoop, SQLServerPedia Syndication | Comments Off on Microsoft Products vs Hadoop/OSS Products

Hadoop and Microsoft

In my Introduction to Hadoop I talked about the basics of Hadoop.  In this post, I wanted to cover some of the more common Hadoop technologies and tools and show how they work together, in addition to showing how they work … Continue reading

Posted in Hadoop, HDInsight, SQLServerPedia Syndication | 2 Comments

Types of NoSQL databases

A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.  NoSQL is often interpreted as Not-only-SQL to emphasize that they may also support SQL-like … Continue reading

Posted in Hadoop, SQLServerPedia Syndication | 8 Comments

What is a data lake?

A “data lake” is a storage repository, usually in Hadoop, that holds a vast amount of raw data in its native format until it is needed.  It’s a great place for investigating, exploring, experimenting, and refining data, in addition to archiving … Continue reading

Posted in Data Lake, Data warehouse, Hadoop, SQLServerPedia Syndication | 11 Comments

The Modern Data Warehouse

The traditional data warehouse has served us well for many years, but new trends are causing it to break in four different ways: data growth, fast query expectations from users, non-relational/unstructured data, and cloud-born data.  How can you prevent this … Continue reading

Posted in Big Data, Data Lake, Data warehouse, Hadoop, PDW/APS, SQLServerPedia Syndication | 8 Comments

Hadoop and Data Warehouses

I see a lot of confusion when it comes to Hadoop and its role in a data warehouse solution.  Hadoop should not be a replacement for a data warehouse, but rather should augment/complement a data warehouse.  Hadoop and a data warehouse … Continue reading

Posted in Data Lake, Data warehouse, Hadoop, PDW/APS, PolyBase, SQLServerPedia Syndication | 5 Comments

Introduction to Hadoop

Hadoop was created by the Apache foundation as an open-source software framework capable of processing large amounts of heterogeneous data-sets in a distributed fashion (via MapReduce) across clusters of commodity hardware on a storage framework (HDFS).  Hadoop uses a simplified programming model.  The … Continue reading

Posted in Hadoop, PDW/APS, SQLServerPedia Syndication | 8 Comments

What is HDInsight?

There are two flavors of HDInsight: Windows Azure HDInsight Service and Microsoft HDInsight Server for Windows (recently quietly killed but lives on in a different form).  Both were developed in partnership with Hadoop software developer and distributor Hortonworks and were made … Continue reading

Posted in Hadoop, HDInsight, PDW/APS, SQLServerPedia Syndication | 8 Comments

PolyBase explained

PolyBase is a new technology that integrates Microsoft’s MPP product, SQL Server Parallel Data Warehouse (PDW), with Hadoop.  It is designed to enable queries across relational data stored in PDW and in non-relational Hadoop data that is stored in the Hadoop Distributed File … Continue reading

Posted in Hadoop, PDW/APS, PolyBase, SQLServerPedia Syndication | 26 Comments