The Complete Magazine on Open Source

An Overview of Open Source Databases

SHARE
/ 521 0

OSDatabase

The ever increasing volumes of data, which are a by-product of modern life, require efficient methods for their management and retrieval. Open source databases have today come into their own, and stand out as solutions for every data management need in the enterprise. Here is an overview of five of the most widely used and highly ranked open source databases.

The past few years in the data management industry have been revolutionary for open source databases. According to reports from DB-Engines, an online initiative to collect and present information on database management systems (DBMS), the three fastest growing databases of 2014 were all open source, and the trend continued through 2015 as well.

In this article, I will try and provide an overview of some of the most common open source DBMS software. Obviously, the list is so huge that it would be impossible to mention all of them, but I will try and list those with the best features. I have picked five of the most widely used and highest ranked databases for discussion in this article.
So let’s get to work and start talking about them in no particular order of ranking.

MongoDB

MongoDB has topped the charts for two consecutive years (2013-14) and has been the runner-up in 2015.
MongoDB is among the several databases that arrived in the mid-2000s under the NoSQL category. It broke the traditional method of storing data in tables and rows by storing it in JSON like structures with dynamic schemas, which was later termed as BSON by MongoDB.

The creators of MongoDB say that the name was derived from the word ‘humongous’ to support large and massive amounts of data. DB software went commercial in 2007 and then open source in 2009.
Written in C++, MongoDB breaks out from traditional RDBMS ideology and hosts a vast set of features. Here is the list of some of them:

1. Stores records in the form of documents, and corresponds to data types in most programming languages
2. Embedded documents and arrays reduce the need for expensive joins
3. Dynamic schema supports fluent polymorphism
4. Full index support on all fields
5. Replication and high availability across LANs and WANs
6. Automatic scaling via auto-sharding and replica sets

The list of features is huge but I will stop here, so readers can explore the software for themselves.

MySQL

From the most trending, let’s move to one that has been the most widely used for years now — MySQL. Named after its co-founder Michael Widenius’ daughter, My, it is a dominant force in the world of RDBMSs. MySQL is a popular choice for Web applications, especially among PHP developers. Even among the strong competition from other RDBMS databases, including the ones forked from it, MySQL made big gains in early 2016 to top the RDBMS popularity chart after a few years.

MySQL is available in two editions:

1. The open source MySQL community edition
2. The MySQL enterprise edition

While the basic engine remains the same in both editions, the enterprise edition scores on support and more frequent updates.

Wide adoption of MySQL means that it does provide a variety of features that excite people, though there are strong competitors. Given below is the list of a few such features and the rest is left for readers to explore:

1. Like most other database systems, MySQL is a relational DB system.
2. Like any other successful database system, MySQL does provide scalability, flexibility, high performance and high availability.
3. MySQL claims to offer the most powerful transactional database engine to provide robust transaction support.
4. Strong data protection with some exceptional security features ensures absolute data protection.
5. It has comprehensive support for every application development need.
And the list goes on and on.

PostgreSQL

Originating from the Ingres project at Berkeley University in the 1980s, PostgreSQL underwent a series of improvements, initiated by several of its founders to finally be named PostgreSQL in 1996. Its first release came out in January 1997. Its popularity has increased in the last few years and, according to DB-Engines, it is among the highest ranked in recent times. PostgreSQL is an object-oriented RDBMS which emphasises extensibility and standard compliance. It implements a majority of the SQL2011 standards and comes with various enterprise class features.

The standard set of features that are expected in any industry DBMS are available in PostgreSQL as well. Here is the list of a few of its features that are gaining popularity:

1. Runs on all major operating systems
2. Fully ACID compliant for transaction reliability
3. Multi-version concurrency control to provide immunity to dirty reads and serialisability
4. Support for SQL data types
5. Streaming replication to continuously ship and apply Write Ahead Log records to the standby server…
…and many more.

Cassandra

Next in the list is Cassandra, ranked the second runner-up by DB-Engines for 2015. It was initially developed by Facebook engineers to power the social media giant’s inbox search and was then made open source in 2008. Cassandra is a distributed DBMS specially designed to handle large amounts of data spread across a vast cluster of machines. It offers support for data spanning across multiple data centres with asynchronous masterless replication.
Technically, Cassandra also broke away from the traditional RDBMS ‘table and columns’ model, with data being stored on a single machine to scale across thousands of servers. Cassandra also places a high value on performance and scalability. A study done by the University of Toronto in 2012 found that among NoSQL databases, Cassandra was the clear winner in terms of scalability, and achieved the highest throughput across the maximum number of nodes.
Even though its creators at Facebook have abandoned Cassandra, it is still powering leading Web infrastructure companies like Twitter, Netflix and Apple. A study done by an Australian company, Solid IT, reveals that Cassandra is the second most popular NoSQL database after MongoDB and the third fastest growing database, over all.

Here is a small list of features that Cassandra offers.

1. Decentralisation: Every node in a cluster has the same role; hence, there’s no master, and so no single point of failure.
2. Multiple data centre replication: It is designed as a distributed system to support nodes across multiple data centres.
3. Cassandra Query Language (CQL): It has introduced a new query language called CQL, an SQL-like alternative.

4. MapReduce Support: Cassandra provides integration for Hadoop with MapReduce support.
Again, I have picked only a few key features from Cassandra’s vast list, leaving the rest for readers to explore.

MariaDB

Another RDBMS software that is gaining popularity is MariaDB. It has been forked from MySQL as an alternative to provide a drop-in replacement for MySQL, with better and a more enhanced set of features compared to the equivalent MySQL release. It was started by the founder of MySQL, Michael Widenius, after Oracle acquired Sun Microsystems (which had bought MySQL in 2008) in late 2009, early 2010.

MariaDB was always meant to maintain high compatibility with MySQL, and it ensures drop-in replacement with libraries as well as binary equivalence. Some of the features of MariaDB are listed below.

1. Open and free: It is developed by the community under GPL and hence is available at no cost.

2. Support: MySQL itself has well established documentation support, which can also be used for MariaDB. Apart from that, there is a large enough MariaDB-specific knowledge base/online community.

3. Speed: MariaDB is regarded as one of the fastest databases available, even faster than MySQL.

4. Functionality: It provides support for all MySQL features, along with additional new and enhanced features developed by the community.

5. Ease of use: As per user feedback, MariaDB is easy to use, especially with features like flexible syntax.
My list of DBMSs is merely a point for discussions to take off on the more than 200 such databases. It’s not important which one is chosen and which is not; rather, my attempt has been to give the reader an overview of how much the open source world has evolved in database management. Also, industry adoption of open source databases has been increasing, day by day, moving away from traditional proprietary databases. Hence the mantra for success is: Go open, go big.

References
[1] http://db-engines.com/en/ranking
[2] https://www.mongodb.org/
[3] https://en.wikipedia.org/wiki/MongoDB
[4] http://searchdatamanagement.techtarget.com/definition/MongoDB
[5] https://en.wikipedia.org/wiki/MySQL
[6] https://www.mysql.com/why-mysql/topreasons.html
[7] https://wiki.postgresql.org/wiki/Main_Page
[8] http://www.postgresql.org/
[9] https://en.wikipedia.org/wiki/PostgreSQL
[10] https://en.wikipedia.org/wiki/Apache_Cassandra
[11] http://www.wired.com/2014/08/datastax/
[12] https://en.wikipedia.org/wiki/MariaDB
[13] http://www.zdnet.com/article/open-source-mariadb-a-mysql-fork-challenges-oracle/