HomeAzure Cosmos DBRelational databases vs Non-relational databases

Comments

Relational databases vs Non-relational databases — 21 Comments

  1. This is a good overview but the technologies are ever-changing. I’d argue that:
    –not all columnstores are NoSQL/non-relational. HP Vertica is an MPP columnstore and its SQL is standard ANSI. You perform your physical modeling just like any standard data warehouse model.
    –storage engines can now be swapped in/out for most products. Vertica can use HDFS natively. MySQL storage engines were swappable for at least 12 years. Products are becoming “hybrid” and “crossovers” to meet more use cases.
    –the lines will further blur as NoSQL products add relational features and relational products add more NoSQL-like features. SQL Server added delayed transaction durability I’m sure as a response to a perceived shortcoming. And sql-like extensions are being added to the Hadoop stack daily. Remember when people were berated when they asked, “how can I turn off tran log?” Well, maybe we don’t always need the ability to roll back (or recover)? What heresy!!!
    –many people are making in-memory work for huge data sets. It can be done and is likely the future for HANA and SAP’s BW product. I have a 100TB couchbase cluster that is all in-memory.
    –NoSQL databases, while open source (free as in speech), are not always free (as in beer). You’ll definitely want vendor support and that ain’t cheap. I mean really, who actually understands how MapReduce works? Are you going to look at the source code at 3am?
    –And if you are an ISV then you need to be aware that anything that is GPL’d will require you to either pursue a commercial license or open source your product. See MongoDB’s licensing terms. This scares ISVs…a lot.
    –You actually can model your documents in a document store to be somewhat relational yet still avoid the overhead of JOINs. This is conceptually like nested sets and pointers. Likewise, I’ve seen lots of SQL Server databases using EAV patterns and XML columns that are not relational at all. This is where most people get confused that truly don’t understand these “new” non-relational technologies. There is logical modeling and then there is the physical implementation. The same logical model can often be expressed in a rowstore, columnstore, or document store.
    –These NoSQL ideas aren’t “new” but are re-expressions of old ideas. Back before Codd we had “network” and “hierarchical” datastores and many hospitals still rely on them today (Mumps).
    –I think the real motivation for “NoSQL” is the CAP Theorem and the fact that most SQL offerings have traditionally done a lousy job with it. Each product handles this uniquely differently and can often be tweaked for specific needs (MySQL has done this for 8 years, SQL Server now has delayed durability.) The CAP Theorem is the key. Most relational guys don’t get this. Yes, there are actually times where I may not mind some transactions being lost or replayed twice if I can process massive datasets realtime. Sometimes data isn’t isn’t your company’s most valuable asset.
    –Every relational guy should understand the basics of this stuff to be able to speak somewhat intelligently about it and not just regurgitate FUD and non-truths. Likewise NoSQL guys that would express an accounting data model in Mongo need their heads examined.

    To that end, great post.

  2. Thank you James for putting this piece together. It is clear and concise written and will give any data “newbie” a very good picture of the data landscape we are in these days. I like you categorization of DB’s. It would be nice to add to it what is commercial and non-commercial use DB’s. Where would you place APS on your diagram?

  3. Nice article James, its cleared my doubts about RDBMS and non relational (NoSQL).

    I am more interested in BigData technology,could you please lte me know which one is good to start.

    I have 11+ ys of experiences in MSSQL and MSBI now I am looking in Analytics/analytical area could you advise me that should I change complete MS to Big or MS has own Big Data technology then what are those , so I can start looking these tech.

    Thanks in Advance

  4. Pingback:12 Core Competencies For Product Managers - Pendo blog

  5. good contents sir….
    such a good things that i learned through this article. this really helpful for someone who are intrested in databases,sql,BIGDATA……
    thank you,
    for such wonderfull article…..i must do share this article to my friends.

  6. Great article! I am working on an executive overview that I need to present to my manager, can anybody point me to some other high-level discussions on this topic, I’m not a great writer (except when it comes to code :)) and I would like to see how some people describe it in a way that’s easy for non-technical folks … Thanks

  7. I love how well you articulated the differences!

    May I just add one more RDBMS, because I’m sure many people choosing between relational and non-relational are also stuck deciding on which software to use. The ones you listed are well known in North America, but I started using Tibero 6 recently and my company saved 50% on licensing fees. I think your readers need to know of cheaper versions, plus it has a better security 😉

    You can find it here: http://www.tmaxsoft.com/cn_en/tibero_cn_en/

    Carolina

  8. I would suggest to put memory optimized rowstore of MemSQL under analytics and operational, as well as disk based columnstore of MemSQL under analytics
    I would also provide here a link of our VP of engineering elaborating on use of rowstore vs columnstore: https://blog.memsql.com/should-you-use-a-rowstore-or-a-columnstore/

    I think that a) SQL can scale to millions of writes and reads, MemSQL is a proof of this b) lines between operational and analytical workloads are blurring, due to use of memory and distributed architectures

  9. NoSQL is ignoring the real need for joins. Let’s say General Motors has a work environment compensation package (I have no clue) and it is by type of worker. In SQL, this is in one table; in MongoDB, it’s a part of EVERY record for every worker around the world. One change to the compensation package would send MongoDB into a month-long tailspin to change it in every single worker’s record.
    I’m still new to MongoDB so I really HOPE someone can address this, because arguing that the only reason for joins is to save space is a feeble one.

    • Be careful when you shoot down a product/solution/architecture given one single use case. It’s often a straw man. Let’s assume you absolutely would want to use Mongo in your use case…then I wouldn’t model the *physical* design in a relational manner with joins. Doc databases have made architectural decisions to avoid joins to gain benefits elsewhere. Joins are suboptimal in the physical implementation of a document-oriented store.

      Assuming you really, REALLY wanted to use Mongo in your use case (which I wouldn’t) then you would want to have a “pointer” in the emp record that points to comp pkg lookup. You are then going to retrieve 2 documents. This is no different than your app today that makes multiple sql calls to point a webpage. This admittedly limits your ability to write reports. This again is a design decision for doc dbs.

      As for “the only reason for joins is to save space”…this is a gross over-simplification. We use star schemas and data warehouse structures to avoid joins in relational dbs.

      It’s important to understand when to use a technology…as well as when not to. It’s also important to understand architectural tradeoffs with these technologies. The best architects understand the strengths and limitations of various tools to ensure good decisions are being made

  10. Pingback:SQL versus NoSQL databases | Big Data and Analytics

  11. Pingback:Making sense of Microsoft technology | James Serra's Blog

  12. Pingback:Making sense of Microsoft technology – Cloud Data Architect

  13. Thanks for the clear and concise explanation! I’m researching possible cloud implementation of the R package and come from the desktop/system admin world so I have a lot of reading to do.
    This was a great intro. Thanks.

  14. Pingback:Distributed SQL – SQLServerCentral

  15. Pingback:Distributed SQL | James Serra's Blog

  16. Pingback:20 Punkte, die Sie bei der Erstellung Ihrer ersten Node.js-Anwendung beachten sollten

  17. Pingback:What Types of Business Decisions Would an EIS Use AI For?

Leave a Reply

Your email address will not be published. Required fields are marked *

HTML tags allowed in your comment: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>