MonetDB is a new, column-store database paradigm for faster access to and high performance of applications.
In this era of Big Data, it is difficult to manage, store and manipulate data. To resolve this issue, there are different paradigms that organisations use to store data, such as structured, unstructured, semi-structured, row based and column based databases. In this article, we are going to explore a popular, open source, free column based database called MonetDB. But before we do that, let’s look at the difference between row based (traditional) and column based databases.
Almost all traditional databases store data in the row format, which means in every table, the data is stored one after another in tuple format. In case of column based databases, data is stored column wise (Figure 1).
Column based vs row based databases: The advantages and disadvantages
Let’s consider the advantages of column based databases over row based databases. The former are faster when we need to access all the data of an entire column, because the data is stored together in a single column. Another advantage is that less hard disk access is required and more values can be stored in a block. Column based databases work faster while performing statistical operations like aggregation, summation, etc. This is because data is stored together, so access time and computation time decrease compared to internal storage. When a specific table in the entire column needs to be changed, then a column based database is faster. This is because the entire data in the column is stored in the same block, making it easier to fetch and update it. A column based database is also better in terms of compression, because similar types of values are stored in the same block; in row based databases, the entire tuple, which has different types of values, is stored in one block.
Row based databases are useful when we have more read operations and insertion of records.
So, the choice of the database will depend upon the features that are required. Column based databases are preferred for statistical operations, where we have a huge amount of data to manage and manipulate.
MonetDB
MonetDB is a free, open source database management system developed by the CGI database research group. The main use case of this database is for data warehousing, where there is a huge amount of data to manage. It uses a vertical fragmentation and columnar execution based engine, and is mainly used for Big Data science. MonetDB + R language integration is popular among data scientists.
Installation of MonetDB in Python
There are different ways to install MonetDB. We can do a standalone installation or install the thin clients provided in different languages like Python and R. Here, we will explore how to use Python for the installation of MonetDB, just for demo purposes.
1. Install the Python-Monetdb package; we are using the Pip utility.
2. Run the command given below in a CLI prompt:
pip install monetdblite
3. We can also install standalone MonetDB in Windows. Download the .msi file from https://www.monetdb.org/downloads/Windows/Jul2017-SP3/.
4. Figure 3 gives the snapshot for Windows installation.
5. To connect to a database, use the code given below in Python:
# create a new database or connect to an existing database in /tmp/db import monetdblite conn = monetdblite.connect(‘/tmp/db’)
To create a cursor to access tables and databases, use the following code:
# create a new cursor c = conn.cursor() # query the database c.execute(‘SELECT * FROM tables’) # fetch the results and print them print(c.fetchall())
Likewise, we can explore more commands from the MonetDB Command Reference Guide.
MonetDB’s performance is indeed something to behold…without any tuning whatsoever, I’ve seen it absolutely obliterate traditional RDBMS’s running on far superior hardware. The biggest hurdle for MonetDB’s acceptance is the limited user community to this point and correspondingly poor community support. Their official site’s leaves much to be desired as well. It’s well worth learning though…it certainly reset my expectations of what was possible performance-wise.