The Complete Magazine on Open Source

The Pros and Cons of Polyglot Persistence

, / 317 0

Infographics with magnifying glass

The design of a database determines its optimal use. A single database engine is inefficient and insufficient for all data searches. This is where polyglot persistence comes in – it helps to shard data into multiple databases to leverage their power.

Today, we have a large number of databases, which range from document databases like MongoDB and graphs like Neo4j, to search databases like ElasticSearch, caches like Redis and more. All of these databases are great at doing a few things well and other things not so well. For example, ElasticSearch is great for full-text search on large volumes of data, something that cannot be done well in MongoDB.
Polyglot persistence is the way to shard or divide your data into multiple databases and leverage their power together. For example, if you have some data on which a search has to be performed, you can store that data in ElasticSearch because it works on a data structure called Inverted Index, which is designed to allow very fast full-text searches and is extremely scalable.

What types of databases can I use?
Document databases (e.g., MongoDB): Document databases are used to store whole JSON documents and query with relevant fields. It’s the go-to database for most developers. Document databases are usually bad at doing joins between collections/tables and doing a full-text search.
Graph databases (e.g., Neo4j): Graph databases are used for storing relations between entities, with nodes being entities and edges being relationships. For example, if you’re building a social network and if Person A follows Person B, then Person A and Person B can be nodes and the ‘follows’ can be the edge between them. Graphs are excellent in doing multi-level joins and are good for features that need the shortest-path algorithm between A and B.
Cache (e.g., Redis): Cache is used when you need superfast access to your data—for example, if you’re building an e-commerce application and have product categories that load on every page load. Instead of hitting the database for every read operation (for every page load), which is expensive, you can store it in cache, which is extremely fast for reads/writes. The only drawback of using cache is that it is in-memory and is not persistent.
Search databases (e.g., ElasticSearch): If you want to do a full text search on your data (e.g., products in an e-commerce app), you need a search database like ElasticSearch, which can help you perform the search over huge volumes of data.
Row store (e.g., Cassandra): Cassandra is used for storing time-series data (like analytics) or logs. If you have a use-case that performs a lot of writes, less reads and is non-relational data, then Cassandra is the database to take a look at.

The advantages of polyglot persistence
Faster response times: You leverage all the features of databases in your app, which makes the response times of your app very fast.
Helps your app to scale well: Your app scales exceptionally well with the data. All the NoSQL databases scale well when you model databases properly for the data that you want to store.
A rich experience: You have a very rich experience when you harness the power of multiple databases at the same time. For example, if you want to search on ‘Products’ in an e-commerce app, then you use ElasticSearch, which returns the results based on relevance, which MongoDB cannot do.

Disadvantages of polyglot persistence
Requires you to hire people to integrate different databases: If you’re an enterprise, you will have the resources to hire experts for each database type (which is good). But if you’re a small company or a start-up, you may not have the resources to hire people to implement a good polyglot persistence model.
Implementers need to learn different databases: If you’re an indie developer or start-up building apps on multiple databases, then you have no choice but to learn multiple types of databases and implement a good polyglot persistence model for your app.
It requires resources to manage databases: If you’re running multiple databases for your app, then you need to take care of backups, replicas, clusters, etc, for each of those types of databases, which might be time consuming.
Testing can be tough: If you shard your data into many databases, then testing of your data layer can be complicated and debugging can usually be time consuming.

Automating polyglot persistence
The good thing is that there are solutions which automate polyglot persistence. Automating polyglot persistence frees you from learning different types of databases or hiring experts. The following such solutions also tend to manage mundane tasks like backups, replication and more, so that you never have to worry about them. Gives you one simple API to store and query your data, and it uses AI to automatically store your data into the database where it should naturally belong. It does auto-scaling, replication and backups for you too. This has a graph DB API, which you can use to store data. Orchestrate is a good database service if you’ve used Neo4j or any graph DB before.