The Internet of Things (IoT) generates vast amounts of data, including streaming data, time series data, RFID data, sensory data, etc. The efficient management of this data demands the use of a database. The very nature of IoT data requires a different type of database. Here are some databases that give very good results when used in conjunction with IoT.
The Internet of Things (IoT) can be regarded as a network in which various things are connected to each other through a common platform. Just visualise a scenario in which every device at home and the workplace is connected, and a world where the air-conditioning in a room automatically lowers its temperature when the outside temperature rises up, when the number of people in any public gathering is easily known, and when one’s health parameters can be monitored on a daily basis. This is the possible impact of the Internet of Things.
The current state of the Internet of Things is very fragmented. There are different companies and organisations that are building their own platforms for either their customers or their individual needs. But a common platform on which all the devices, irrespective of their company, can be connected with each other via a user friendly interface, is still missing.
IoT devices are estimated to number in the trillions in the coming five years.
Is a database necessary for IoT?
The Internet of Things creates many tedious challenges, especially in the field of database management systems, like integrating tons of voluminous data in real-time, processing events as they stream and dealing with the security of data. For instance, IoT based traffic sensors applied in smart cities would produce huge amounts of data on traffic in real-time.
Databases have a very important role to play in handling IoT data adequately. Therefore, along with a proper platform, the right database is equally important. As IoT operates across a diverse environment in the world, it becomes very challenging to choose an adequate database.
The factors that should be considered before choosing a database for IoT applications are:
1) Size, scale and indexing
2) Effectiveness while handling a huge amount of data
3) User-friendly schema
4) Portability
5) Query languages
6) Process modelling and transactions
7) Heterogeneity and integration
8) Time series aggregation
9) Archiving
10) Security and cost
The types of data in the Internet of Things are:
1) RFID: Radio frequency identification
2) Addresses/unique identifiers
3) Descriptive data for processes, systems and objects
4) Pervasive environmental data and positional data
5) Sensor data: Multi-dimensional time series data
6) Historical data
7) Physics models: Models that are templates for reality
8) State of actuators and command data for control
Databases suited for the Internet of Things
InfluxDB: InfluxDB was first released in 2013, and is one of the recent databases. The Go programming language was used in developing this database, which is totally based on LevelDB, a key-value database. InfluxDB is a time series database, which is used to optimise and handle time series data. Time series data was first released by Kdb in 2000, but InfluxDB became popular with the rise in the Internet of Things as it gave movement to NoSQL, NewSQL and a vast amount of increasing data.
The advantages of using InfluxDB for IoT data include:
1) Allows indexing of series
2) It has an SQL-like query language
3) It also provides the built-in linear interpolation for missing data
4) It supports automatic data down sampling
5) Supports continuous queries to compute aggregates
CrateDB: CrateDB is a distributed SQL database management system. Being open source and written in Java, it includes components from Facebook Presto, Apache Lucene, Elasticsearch and Netty—thus it is designed for high scalability. CrateDB was made for putting IoT data to work. From the industrial Internet and connected cars to wearables, CrateDB is the database of choice for innovators of new IoT solutions.
The advantages of using CrateDB for IoT data include:
1) Millions of data points per second: Fast, linearly scalable data ingestion
2) Real-time queries: Columnar indices and field caches provide in-memory SQL performance
3) Dynamic schema: Add and query new sensor data structures on-the-fly
4) IoT analytics: Fast, robust time series, AI, geospatial, text search, joins, aggregations
5) Always on: Built-in data replication and cluster rebalancing ensure non-stop performance
6) ANSI SQL: No lock-in, and easy for any developer to use and integrate
7) Built-in MQTT broker: Direct device-to-database integration
8) IoT ecosystem: Works with Kafka, Grafana, NodeRED, and other popular IoT stack software
9) Runs anywhere for efficient processing at the edge or in the cloud
MongoDB: MongoDB is a free and open source cross-platform document-oriented database program. It is categorised as a NoSQL database program. JSON-like documents with schemas are used by MongoDB. It is preferred by organisations for IoT, as it lets them store data from any context, which can be analysed in real-time, and also to change the schema as they go along.
The advantages of using MongoDB for IoT data include:
1) Highly powerful database
2) Document-oriented
3) Has uses for general purposes
4) Being a NoSQL database, it uses JSON-like documents with schemas
RethinkDB: In the open source database list, RethinkDB stands at the top. It is a scalable JSON database for the real-time Web, which is built from the ground up. RethinkDB introduces an exciting new access model by transposing the traditional database architecture. It can continuously push updated query results to applications in real-time, when a command is given to it by the developer. This is a feature the developers call changefeeds. RethinkDB serves as a database, real-time repository and message broker of the system state, which is allowed by changefeed. Its real-time push architecture dramatically reduces the time and effort necessary to build scalable real-time apps.
The advantages of using RethinkDB for IoT sensor data include:
1) RethinkDB has an adaptable query language for examining APIs, which is very easy to set up and learn.
2) Commands are automatically shifted to a new server if any primary server fails.
3) Plug-and-play function of nodes in real-time, without any downtime for even a single second, helps in the easy addition of nodes.
4) Offers asynchronous queries via Eventmachine in Ruby and Tornado, which gives an asynchronous application programming interface.
5) It offers SSL access just to have secured access to RethinkDB via public Internet.
6) Floor, ceil and round are various mathematical operators that are offered by RethinkDB.
SQLite: SQLite Database Engine is a process library that provides a serverless (self-contained) transactional SQL database engine. It has had a major impact on game and mobile application development due to its portability and small footprint.
SQLite works appropriately with the devices that do not require any human support, as the database requires no administrative permissions. It is a good fit for use in cell phones, set-top boxes, televisions, game consoles, cameras, watches, kitchen appliances, thermostats, automobiles, machine tools, air planes, remote sensors, drones, medical devices and robots, as well as in IoT.
Client/server database engines are designed to live inside a data centre at the core of the network. SQLite works there too, but SQLite also thrives at the edge of the network, fending for itself while providing fast and reliable data services to applications that would otherwise have dodgy connectivity.
The advantages of using SQLite for IoT data include:
1) Offers a small memory footprint
2) It is authentic
3) No setting up required prior to use
4) Has no dependencies
Apache Cassandra: Apache Cassandra is a free and open source distributed NoSQL database management system, which was initially released in 2008. It was intended to handle huge amounts of data through many commodity servers, providing high availability with no single point of failure.
In IoT, the generation, tracking and sharing of data through a variety of networks is carried out on an immense scale due to the massive number of connected devices. Cassandra is excellent at utilising lots of time series data that comes directly from devices, users, sensors, and similar mechanisms that subsist in diverse geographic locations.
The advantages of using Apache Cassandra for IoT
data include:
1) Fault tolerant
2) Demonstrates high performance
3) Decentralised: Every node in the cluster is identical
4) Scalable
5) Durable
6) Ensures you’re in control: Each update has a choice of synchronous and asynchronous replication
7) Elastic: Both read and write execute in real-time, thus there is no downtime for any application
8) Professionally supported: It reinforces contracts and services that are available from third parties.