Category Archives: Hadoop

Types of NoSQL databases

founA NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.  NoSQL is often interpreted as Not-only-SQL to emphasize that they may also support SQL-like … Continue reading

Posted in Hadoop, SQLServerPedia Syndication | 2 Comments

What is a data lake?

A “data lake” is a storage repository, usually in Hadoop, that holds a vast amount of raw data in its native format until it is needed.  It’s a great place for investigating, exploring, experimenting, and refining data, in addition to archiving … Continue reading

Posted in Data warehouse, Hadoop, SQLServerPedia Syndication | Leave a comment

The Modern Data Warehouse

The traditional data warehouse has served us well for many years, but new trends are causing it to break in four different ways: data growth, fast query expectations from users, non-relational/unstructured data, and cloud-born data.  How can you prevent this … Continue reading

Posted in Big Data, Data warehouse, Hadoop, PDW/APS, SQLServerPedia Syndication | 3 Comments

Hadoop and Data Warehouses

I see a lot of confusion when it comes to Hadoop and its role in a data warehouse solution.  Hadoop should not be a replacement for a data warehouse, but rather should augment/complement a data warehouse.  Hadoop and a data warehouse … Continue reading

Posted in Data warehouse, Hadoop, PDW/APS, SQLServerPedia Syndication | 4 Comments

Introduction to Hadoop

Hadoop was created by the Apache foundation as an open-source software framework capable of processing large amounts of heterogeneous data-sets in a distributed fashion (via MapReduce) across clusters of commodity hardware on a storage framework (HDFS).  Hadoop uses a simplified programming model.  The … Continue reading

Posted in Hadoop, PDW/APS, SQLServerPedia Syndication | 7 Comments

What is HDInsight?

There are two flavors of HDInsight: Windows Azure HDInsight Service and Microsoft HDInsight Server for Windows (recently quietly killed but lives on in a different form).  Both were developed in partnership with Hadoop software developer and distributor Hortonworks and were made … Continue reading

Posted in Hadoop, PDW/APS, SQLServerPedia Syndication | 6 Comments

PolyBase explained

PolyBase is a new technology that integrates Microsoft’s MPP product, SQL Server Parallel Data Warehouse (PDW), with Hadoop.  It is designed to enable queries across relational data stored in PDW and in non-relational Hadoop data that is stored in the Hadoop Distributed File … Continue reading

Posted in Hadoop, PDW/APS, SQLServerPedia Syndication | 14 Comments