The Complete Magazine on Open Source

The Synergy between Big Data and the Internet of Things

5.26K 0

The Internet of Things generates fast streams of useful data. The challenge before enterprises is to store the vast amounts of data and to make the best use of it. This is where Big Data plays an important role.

An increasing number of gadgets now use smart technologies to generate data through embedded sensors. A car with smart apps installed in it, a smart home device that monitors the temperature indoors, a fitness tracker that sends the steps of a workout routine to your phone’s app—all these are examples of the Internet of Things (IoT). These devices are connected to the Internet. It is estimated that by 2020, there will be 24 billion IoT devices across the world, which would naturally result in the generation of massive volumes of data. The digital universe is set to reach 40 zetta bytes by 2020. So, IoT delivers the information while Big Data acts on it to derive insights that will render these devices the precursors of a new technological age.

What is IoT?
IoT (Internet of Things) is a network of interconnected devices such as computers, cars, smartphones, kitchen appliances, heart monitors, etc. As technology advances, even gadgets with the most basic functions like a watch, heart pacemakers, remote controls, etc, will have embedded sensors capable of collecting and exchanging data over the Internet. These can be controlled by a remote device. The sensors and chips generally gather data but don’t process it. They send it to another place for analysis. Data on the performance of smart gadgets and customer usage patterns is generally gathered and analysed.

Components of IoT
IoT ecosystem: The IoT ecosystem includes all the elements such as a dashboard, remote, gateways, the network, security and storage, which allow devices to be connected to their users—businesses, government and consumers.

Entity: These are the users such as businesses, government and consumers, who use the devices and generate the data. They comprise the group that can potentially benefit from the analysis of the data.
Physical layer: This layer consists of the physical hardware of the IoT ecosystem. This includes the devices, embedded sensors, networking gear, physical gateways/switches, etc.

Network layer: This layer is mainly responsible for transferring the data generated and collected at the physical layer, to other devices.

Application layer: This layer is mainly intangible as it holds the protocols used for sharing data across heterogeneous devices. It also consists of the interface that helps different devices identify and communicate with each other efficiently.

Remote: The remote allows entities to control and connect to their IoT devices through a dashboard such as an app. Examples of remotes are PCs, smartwatches, connected TVs, tablets, smartphones, etc.

Dashboard: The dashboard is included in the remote, where it allows the entities to control and manage the IoT ecosystem.

Figure 1: The IoT ecosystem

What is Big Data?
Big Data refers to the large volume of both structured and unstructured data. Big Data can be mined for insights and information. Data these days runs into exabytes. There are ‘5 Vs’ of Big Data.
Volume: This refers to the amount of data that is generated all over the world. Ninety per cent of the world’s data has been generated in the last two years.
Velocity: This is the speed at which the data is generated as well as the speed at which it travels. For example, the New York Stock Exchange creates over 1TB of data daily.
Variety: This refers to the different forms of data generated including structured, unstructured and semi-structured data. Eighty per cent of the world’s data is unstructured.
Veracity: This refers to the accuracy and reliability of the data. Uncertainty in data due to inconsistency and incompleteness leads to losses to companies that can add up to millions of dollars.
Value: Value signifies the yield and advantage that the data provides to businesses in the form of insights provided by analysing and mining the massive data.

The intersection of Big Data and IoT
As there are multitudinous sensors and smart devices all over the globe, IoT triggers an inundation of data or Big Data. Only Big Data technologies and frameworks can handle such colossal data volumes that are streaming varied types of information. The more the IoT grows quantitatively, the more Big Data techniques will be required. Within this space, organisations need to shift focus to the rich data, which is easily accessible in real-time. Such data affects the customer base and can generate meaningful conclusions though mining. Data from sensors should be processed to find patterns and insights in real-time to advance business goals. Existing Big Data technologies can effectively harness the incoming sensor data, store it and later analyse it efficiently using artificial intelligence. Effectively, for IoT processing, Big Data is the fuel and artificial intelligence is the brain.

Benefits from the intersection of IoT and Big Data
In the present day, over half of all IoT activity is in the fields of transport, manufacturing, user applications, smart cities, etc. IoT will create new business opportunities in the following ways.
New business models: Companies could create value streams for clients, speed time to market and react quickly to client demands.
Real-time information on mission-critical systems: Companies can collect data about products and processes quickly, and improve market agility.
Diversification of revenue streams: Enterprises can monetise more services in addition to the conventional business services.
Global visibility: Enterprises can have better insights into their business, like tracing the path of a component from one extreme of a supply chain to another, which reduces the cost of business in distant localities.
Efficient, intelligent operations: Information from independent endpoints can be accessed by companies to make impromptu decisions on sales, logistics, etc.

Data storage solutions: PaaS
As the continuous streams of machine data from IoT require huge physical storage, organisations are migrating to PaaS (Platform-as-a-Service). This eliminates the need for companies to have their own storage infrastructure, which would need continuous expansion to accommodate the increasing data. PaaS provides easy scalability, compliance, flexibility and a sophisticated architecture that is specially customised to handle IoT data. Moreover, one can opt for private, public or hybrid cloud platforms. Private platforms cater to only a single organisation, so the data doesn’t share a physical border with external data. Public platforms cater to many organisations and have logical separation of storage space on a single physical storage entity. Hybrid platforms are also shared like public platforms, but the sharing parties usually belong to the same field of business, which allows them to avail the advantages of a customised architecture that benefits their domain.

Big Data technologies of IoT
The first phase consists of receiving events from IoT connected devices. Wi-Fi, Bluetooth, etc, can be used to connect the devices to receivers. The messages notifying users about events must be sent via an efficient protocol to a broker. MQTT (Message Queue Telemetry Transport) is a popular protocol for transfers among the agents. Mosquitto is a widely used version of a MQTT broker.
In the second phase, upon receiving data, Hadoop and Hive are commonly used to store the data. Apache CouchDB is a NoSQL database which is highly suited for IoT due to its low latency and high throughput. The schema-less database helps with the varying machine data. Apache Storm is preferred for real-time processing and Apache Kafka for intermediate message brokering.

General architecture of IoT Big Data
Context data layer: This collects the external non-IoT data used for IoT data processing later on as extra context/meta data, e.g., start/stop data feeds.
IoT service layer: This handles the interactions between the devices to collect data from IoT devices and also send control commands to them. Bi-directional communication is handled by this layer.
Data/protocol mediator: This is responsible for keeping data in harmonised data entities before it gets published by the data and control layer. This layer is standalone and ensures uniformity.
Data/control broker: This allows third party applications to fire a query or API for accessing harmonised data entities. It also controls requests from the application layer.
Peer API access management: This interacts with peer enterprises to publish relevant context data.
Developer API access management: This controls the permissions for harmonised data entities (both context and IoT) and helps control services provided to third party applications. Access control, authentication and authorisation are managed here. Privacy and security are its main responsibilities.
IoT/Big Data store: This provides short to medium data storage capabilities under the control of the data and control broker. Insights are to be found amongst the ad hoc data relations. Apache Hadoop, Apache Cassandra, MongoDB, etc, are commonly used. Neo4J and Tital are graph databases that are increasingly being used for social media related data.
IoT/Big Data processing: Analytics and business intelligence procedures are carried out here. Analytics includes conventional methods of exploring statistical relationships and the use of analytical engines to produce output though a predefined process. Intelligence signifies usage of artificial intelligence and machine learning to create adaptable algorithms for a match between predicted and desired outcomes. Apache Spark, Apache TinkerPop3, Apache Mahout and TensorFlow are widely used.

Figure 2: Architecture of IoT Big Data

Use cases of Big Data and IoT
Fleet management: Many transportation companies carry sensors that monitor drivers’ behaviour and a vehicle’s location. Good driving skills and on-road safe behaviour get rewarded by insurance companies. IoT gives telematics an advantage by providing detailed machine log data of all the mechanical and electrical components. UPS, the global logistics firm, widely uses this technology to monitor the speed, mileage, break stops, fuel consumption, engine usage, etc, of the vehicles in its fleet. The company hence reduces harmful emissions and fuel consumption.
Healthcare: Wearable fitness tracker and healthcare apps help people monitor their health. Data from these devices can be used to track parameters like blood pressure, sugar levels, etc, as well as to get a prognosis for possible vulnerability to diseases. The Preventice company integrates apps, mobiles, laptops, tablets, the cloud, etc, for remote patient monitoring. The firm allows the customers’ doctors to monitor their health online to avoid regular check-ups. Proteus is a startup which has sensors in the pills it makes, which can be used to check if patients are following their prescriptions.
Agriculture: John Deere is a multinational company selling farm equipment. It monitors various parameters like the soil moisture levels, etc. The data goes to a centralised managing platform where, based on the moisture levels, the farmers can be alerted when to irrigate. This prevents unnecessary irrigation and avoids the concentration of water resources in particular areas.

Hurdles in the widespread usage of Big Data and IoT
Standards: For efficient working of the IoT, there should be a predefined framework followed by devices and applications to exchange data safely over wireless or wired networks. OneM2M is an organisation which publishes the preferred standards set by the major technology giants. Sources in the firm insist that there should be interoperability among varied industries such that a common platform connects smart meters, cars, watches, pacemakers, etc.
Security and privacy: Security in some sensitive applications like a biological sensor recording the human vital signs should be protected against breach of privacy. National infrastructure related data is critical for the country’s security and should have appropriate safeguards against hackers. Smart home lock systems, industrial security sensors, etc, all need protection against malicious users who can trespass illegally. IoT is advantageous since it can be operated over the Internet, but it is very risky on account of that very reason. The Internet can be breached by intruders and the devices can be wrongfully used.
Network and data centre infrastructure: Data centres and infrastructure will be under duress due to the incoming data deluge. The flows can be in bursts or continuous, primarily between the applications and sensors.
Analytics tools: IoT management is complicated and to build analytics for insights is no easy task. Various platforms use different languages and professionals have to be trained to deal with each of them.
Skills: IoT and Big Data are multi-disciplinary, and professionals require a working knowledge of both fields. As the topics are relatively new, old-school technological workers need to be trained and acquainted with the new technologies. Business analysts are required to frame the questions that will best extract the data and present the outcomes to the clients. Data scientists are required to use analytical tools to derive insights and do the technical work.
Big Data and IoT complement each other to bring out the advantages of each. The technological world has realised their significance and the Big Data + IoT industry is set to become a multi-billion dollar business, with researchers and IT firms starting to realise the potential behind the hype. This alliance is the future of technology and will fundamentally change the world around us.