The Complete Magazine on Open Source

Best open source databases for IoT applications

3.9K 0

The Internet of Things (IoT), because of its inherent nature, requires certain features in the databases associated with it. This article gives a tiny selection of open source database management systems that are suited to usage for the IoT.

The term ‘Internet of Things’ is used to refer to: (i) the global network of smart objects interconnected by means of Internet technologies, (ii) the set of supporting technologies necessary to realise this, i.e., RFIDs, sensors, inter-machine communicating devices, and (iii) the ensemble of applications and services leveraging such technologies to open new business and marketing opportunities.

According to a report by Gartner, 8.4 billion interconnected devices will be in use in the world in 2017. The Internet of Things presents highly novel challenges, especially to database management systems, like integrating tons of voluminous data in real-time, processing events as they stream, and dealing with the security of data. An example would be IoT based environment temperature sensors fitted in smart cities, which produce huge amounts of data on the temperature and humidity of the live atmosphere in just a few minutes.

In order to handle IoT data effectively, it is highly important to find the right sort of database. But choosing an efficient database for IoT applications could be really challenging as the IoT environment is not always the same. There are many factors which have to be kept in mind while choosing a database for IoT applications. The most important of these are scalability, ability to handle huge amounts of data at adequate speeds, flexible schema, portability with varied analytical tools, security and costs.

An IoT database should have the capability of being fault-tolerant and highly available. If any node in the database cluster goes down, it should still be capable of accepting read and write requests. Distributed databases make multiple copies or replicas of data, and write the data over multiple servers. If any server storing the data fails, then other servers take over the task of storing and respond to the query till the failed server is up. IoT databases should be highly available, as the IoT database handling systems can face highly voluminous writes and stores. If any database server is down or the data write is too high for a distributed database in real-time, data can be stored in the messaging system until the database processes the backlog of data or any additional servers which are added to the main database cluster.

The following are some of the top open source databases available for IoT based applications.

InfluxDB
InfluxDB is an open source distributed time series database developed by InfluxData. It is written in the Go programming language, and is based on LevelDB, a key-value database. In addition to a front-end, an HTTP interface and libraries are provided to users for database interaction. The main advantage of InfluxDB is its capacity to aggregate values in time buckets on-the-fly without any manual intervention.

InfluxDB can be accessed by software like Grafana, which is a powerful front-end tool providing visualisation features for time series data. InfluxDB has no external dependencies and SQL like queries are used for querying a data structure comprising measurements, series and points. Each point consists of varied key-value pairs called fieldset and timestamp. Values can be 64-bit integers, 64-bit floating points, strings and Booleans. Points are indexed by their time and tagset. InfluxDB stores data via HTTP, TCP and UDP.

Features

  • Purely written in the Go programming language and facilitates compilation into a single binary with no external dependencies.
  • High performance customised data store written especially for time series data. The TSM engine of InfluxDB allows efficient and high speed data storage and compression.
  • Plugins support for other data ingestion protocols like Graphite, collectd, OpenTSDB.
  • In-built Web front-end tool for database and user administration.
  • Competent in merging multiple series together.

Official website: https://www.influxdata.com/
Latest version: 1.1.1

Figure 1: WebUI for InfluxDB

CrateDB
CrateDB is an open source distributed SQL database management system developed by Crate.io Inc., which fully integrates a searchable document-oriented data store. Christian Lutz, CEO of Crate.io, said, “When we founded Crate.io we set out to reinvent SQL for the machine data era. Today, 75 per cent of our customers use CrateDB for managing machine and IoT because of its easy usage, performance and versatility.”

CrateDB makes machine data applications accessible to SQL developers; prior to this these were only possible using NoSQL solutions. CrateDB combines SQL with search versatility and ease of scalability of containers. It provides a good alternative to analytic data store tools like Splunk. The CrateDB platform includes the distributed SQL query engine for providing faster joins, aggregations and ad-hoc queries; SQL with integrated search for data and query versatility; and container architecture and automatic data sharding for simple scaling.

The main language used by CrateDB is SQL but it also makes use of the document-oriented approach of NoSQL style databases. It uses the SQL parser from Facebook Presto for its query and prediction analysis. It includes an in-built administration interface. The Crate Shell CLI allows users to put up interactive SQL queries.

Features

  • Highly scalable: Updates to the database are easy and can be made by simply adding new machines to update the cluster; there is no need for any re-distribution of data in the cluster as it is done automatically by CrateDB.
  • Highly available: CrateDB allows the database to be highly available if anything goes wrong, as it provides automated replication of data across the cluster; even hardware and software updates don’t interrupt normal data operations. CrateDB has the capability of self-healing infected nodes.
  • Real-time data ingestion: CrateDB delivers millisecond speed query performance even if writes are taking place, and removes locking overheads.
  • Supports various data: CrateDB supports both relational as well as JSON-documents. And it also provides blob storage to store and retrieve videos, pictures or other unstructured files.
  • It supports geospatial queries and dynamic schemas, making CrateDB fully flexible, which is very good for Agile based development and IoT database storage at the back-end.

Official website: https://crate.io
Latest version: 1.0.4

Riak time series database
Riak time series (TS) database from Basho is an open source, distributed NoSQL key-value stored optimised database for the Internet of Things (IoT). With this database, the user can associate a large number of data points with a specific point in time. It is based on masterless architecture, in which every node in the cluster is capable of serving read and write requests; the distributed database automatically co-locates, replicates and distributes the data across the cluster to achieve high performance and availability.

Riak TS database is highly optimised for data access requirements. It supports Apache Spark integration, which makes integration support possible for Spark streaming, dataframes and Spark SQL.

Riak TS can be installed directly on the data centre or public cloud. AWS Amazon Machine Images (AMI) are also available for this database to facilitate users to experience Riak TS in the AWS workspace.

Features

  • Supports addition of new nodes to the existing cluster architecture without sharding; data is automatically and uniformly distributed across the database cluster.
  • Supports DDL or Data Definition Language for table and field definitions, and supports storage of both structured and semi-structured data.
  • Supports multi-cluster replication, which facilitates systems administrators to replicate the data across the in-house data centre and any geo-location data centre anywhere in the world.
  • Supports SQL-like data queries by users for easy and flexible access to global databases.
  • Supports application integration with APIs and client libraries in various languages like Java, Ruby, Python, Erlang, Go, Node.js and .NET.
  • Riak Meso framework provides efficient cluster resource management and ‘push button’ scale-up/down for RIAK nodes.
  • Supports full integration with Apache Spark for operational analysis of time series data.

Official website: http://basho.com/products/riak-ts/
Latest version: 1.3

MongoDB
MongoDB is a highly powerful, flexible, free and open source, document-oriented, scalable and general-purpose database. It has the ability to scale out features such as secondary indexes, range queries, sorting, aggregations and geospatial indexes. It is classified as a NoSQL database as it uses JSON-like documents with schemas.

MongoDB adds dynamic padding to documents and pre-allocates data files to trade extra space usage for consistent performance. It makes efficient use of RAM for caching and correcting queries for indexes. MongoDB supports a rich query language to support read and write operations (CRUD) as well as data aggregation, text search and geospatial queries.

Features

  • Supports generic secondary indexes for a variety of fast queries, and provides unique, compound, geospatial and full-text indexing features to users.
  • Supports ‘aggregation pipelines’ to build complex aggregations from simple pieces for optimisation of the database.
  • Supports TTL (Time-To-Live) collections for data that should expire after a certain period of time.
  • Supports easy-to-use protocol for storing large files and metadata files.
  • Supports JSON to store and transmit information. JSON, being standard protocol, is a great advantage for both the Web and the database.
  • Supports Map-Reduce on the server side for information processing using JavaScript functions.
  • Supports MongoDB Management Service (MMS) tool for allowing users to track databases and backing up the data.
  • Supports automatic load balancing configuration because of data placed in shards.

Official website: https://www.mongodb.com/
Latest version: 3.4

Figure 2: Riak TS database – Web interface

RethinkDB
RethinkDB is an open source, distributed database primarily used to store JSON documents; it has the capacity of scaling up to multiple machines. RethinkDB is regarded as the first and foremost choice for developers, especially IoT based developers, for feeding real-time data. It has completely revolutionised the traditional database architecture by invoking a new access model to update query results to applications in real-time. RethinkDB offers a flexible query language for monitoring APIs, and is highly easy to set up and learn.

RethinkDB offers a number of advantages over MongoDB. These are:

  • An advanced query language that supports table joins, sub-queries, and massively parallelised distributed computation.
  • An elegant and powerful operations and monitoring API that integrates with the query language, and makes scaling RethinkDB dramatically easier.
  • A simple and beautiful administration UI that lets you shard and replicate in a few clicks, and offers online documentation and query language suggestions.

Features

  • Fault tolerance: It supports the automatic shift to a new server if the primary server fails.
  • Easy addition of nodes: Plug-and-play of nodes in real-time, without any downtime for even a single second.
  • Asynchronous application programming interfaces: Supports asynchronous queries via Eventmachine in Ruby and Tornado.
  • Supports SSL access to have secured access to RethinkDB via public Internet.
  • More functions: Supports various mathematical operators like floor, ceil and round.

Official website: https://rethinkdb.com/
Latest version: 2.3.5

SQLite
SQLite is an open source and embedded relational database, which is designed to provide an easy way for applications to manage data without the overhead. It is highly portable, easy to use, compact, efficient and reliable.
SQLite is ACID-compliant; it implements most SQL standards, and uses dynamically and weakly typed SQL syntax. SQLite engine is not a standalone process like other databases; it can link to static as well as dynamic applications.

Features

  • Doesn’t require a separate server process or system to operate, and can operate in a serverless environment.
  • No requirement for any system administration, and needs a low-configuration machine for build up.
  • Self-contained and has no external dependencies.
  • Written in ANSI-C, and provides easy and simple API.
  • Cross-platform: Compatible with UNIX, LINUX, Windows, MAC-OS x, etc.
  • Transactions are fully ACID compatible, allowing safe access from multiple processes.
  • Supports all SQL queries found in SQL92.
  • Fully tested and verified code in SQLite, which is error-free and always up-to-date.

Official website: https://www.sqlite.org
Latest version: 3.17.0

Figure 3: RethinkDB interface

Apache Cassandra
Apache Cassandra is regarded as a highly scalable and distributed open source database for managing voluminous amounts of structured data across many commodity servers. As compared to other open source databases, Cassandra offers various additional high performance capabilities in terms of availability, linear scale performance, simplicity and easy distribution of data across multiple database servers.
Cassandra was developed by Facebook with the prime motive of facilitating Inbox search and was made open source in 2008. It implements the ‘Dynamo-style replication model’ with no single point of failure, and adds a more powerful ‘column family’ data model.

Features

  • Massively scalable architecture: Cassandra has a masterless design, where all nodes are at the same level, which provides operational simplicity and easy scale out.
  • Masterless architecture: Data can be written and read on any node.
  • Linear scale performance: As more nodes are added, the performance of Cassandra increases.
  • Fault detection and recovery: Failed nodes can easily be restored and recovered.
  • Flexible and dynamic data model: Supports datatypes with fast writes and reads.
  • Data protection: Data is protected with commit log design and built-in security like backup and restore mechanisms.
  • Tunable data consistency: Support for strong data consistency across distributed architecture.
  • Multi-data centre replication: Cassandra provides features to replicate data across multiple data centres.
  • Data compression: Cassandra can compress up to 80 per cent data without any overhead.
  • Cassandra query language: Cassandra provides a query language that is similar to SQL language.

This makes it very easy for developers moving from a relational database to Cassandra, to use it.

Official website: http://cassandra.apache.org
Latest version: 3.10