Choosing a database is a long term decision and altering that decision later can be a difficult and expensive proposition. So, IT managers need to get it right the first time around.
Choosing a database is a long term decision and changing that decision later can be a difficult and expensive proposition. So, IT managers cannot afford to go wrong in the first instance. MySQL and PostgreSQL are the two popular alternatives in the world of open source databases, and are the most popular amongst admins preparing a new rollout.
According to Rohit Rai, sales director, India, South East Asia and Australia at EnterpriseDB, “The database market today is very mature and it is a mature technology too. We are talking about a 25 year-old database market. There are many databases that have come and gone, and now there are a few commercial choices that are left in the world. If you look at the database as a technology, apart from being pretty mature, it is standards compliant and well-documented. Open source databases also started out a long time back. If you look at the history of PostgreSQL, it’s over 15 years old and has a healthy community around it.”
Things to consider while selecting an open source database
With a number of database options available in the market, the job of IT managers actually becomes difficult while trying to choose one of them. So, how should admins go about it? The best point to start from is evaluating your requirements. A database needs to be selected based on the underlying data and the usage of that data. Some of the pointers are given below:
- What type of data is it? [Words (English/multilingual), numbers, books, images, movies, maps, etc]
- What is the volume of data that the database needs to contain? [Megabytes, terabytes or petabytes]
- How fast is the data generated? [Bytes/sec, MBps or GBps]
- How many people access it concurrently? [In the 10s, 100s, 1000s or millions]
- What is the read vs write volume? [Will only the same person who is writing read it, will one person write and millions read it, or will there be millions of writes but very few reads, i.e., log records?]
- Does the data need to be distributed? [Single computer or across countries]
- How much computation is involved in extracting information from the data? [Nothing – data is extracted as-is; some simple computation; complex joins of data or huge analytics computations]
- What expertise level do you expect your engineers to have? [Will they be able to manipulate data themselves or will you need expert database admins?]
- In what language does the data need to be accessed? [C, C++ to Ruby, Python or Golang]
- How valuable is the data? [Useless once the connection is lost, i.e., mobile phone locations or Web transactions; extremely important like bank accountsl; or critical, wherein lives may be lost if the data is lost, such as while monitoring rocket stability, satellite orbits, ICU device data, etc]
- What kind of data safety measures do you require? [Ranging from taking back-up once in a while to complex multi-country and fire safe disaster recovery mechanisms]
- What level of security do you need for the data? [Is it common knowledge or top secret data?]
- What kind of integration/ecosystem support does your product need? [A product uses a lot of technologies besides the database. The database you use needs to integrate and work well with these other technologies. An ecosystem ensures that these integrations work as new versions of the products continue to roll out.]
An IT manager needs to look at some or more such questions to find the right database to use. For a database that needs to be used on the Web usability, scalability, security and read distribution matters greatly.
Apart from these, one should have a futuristic approach while choosing a database. It is one of the most important criteria that an IT manager should keep in mind. Typically, IT managers should factor in the current requirements including scalability, security, robustness and ease of use, and should also consider what may emerge in the future as the organisation grows, and look for a database that is flexible enough to meet the upcoming challenges.
Acceptance of open source databases
Acceptance of open source is increasing in all fields and this is also true for open source databases. Government legislation is also pushing the usage of open source technologies. Rai of EnterpriseDB asserts, “Because of the fact that database technologies have existed for a long time, they seem to offer a lot of functionalities. Open source databases are as good as any other commercial alternative. What may prevent their adoption is lack of awareness. IT managers may not have the confidence to say that open source databases can actually fulfil their needs completely.”
But things are changing gradually; economic conditions are pushing IT managers to find alternatives to the traditional commercial licensing models and go for options where they have to pay only for support. This is why open source databases are being viewed as even more attractive.
Comfort also comes from having references from other organisations that have successfully deployed open source databases. For MySQL, there are many such successful implementations in India and all over the world. What additionally inspires confidence is the amount of help you can get for your implementation. This includes both the help from the open source forums as well as professional help.
Managing open source databases is easy
Once you are able to choose the right database for your business, the next challenge is to manage it. Rai says, “If IT managers want to bet on open source databases, there is no risk to it. All they need to do is to manage them. Even if you have not used an open source database before, you can transfer your skills quickly. Open source databases like PostgreSQL are not a challenge, so one can use them effectively from Day 1 itself. It’s just a change of mindset that has to happen. Many companies and government organisations have already started doing that.”
Latest trends in the world of database
The latest trends are towards Big Data, the cloud, NoSQL, high availability, adapting to the multi-core microprocessor architectures and solid state devices (SSDs).
Let’s look at each of these in detail.
Big Data: A lot of data is being automatically generated by devices, i.e., data about traffic, climate, user activity on the Web, etc. This data is huge and increasing by the second. As individual nuggets of information, the data has not much value, but once aggregated, it is a goldmine. Climate patterns, traffic flow patterns and social connectivity patterns can be mined. Big Data needs special handling for collection and processing.
The cloud: More and more businesses want to avoid investing in their own infrastructure. They want to use a publicly available service that can be expanded and reduced based on the fluctuating demands.
NoSQL: Applications providers want to have more flexibility with the underlying structures that contain their data. Users are demanding a database for which they do not have to initially create a fixed structure. As the application matures, application developers want to have the flexibility to change the structure of the underlying database. NoSQL is a trend which allows this flexibility.
High availability: Data is growing and the world is shrinking. The data therefore is getting more and more distributed. The data generators and data consumers are spread across a wide geography. This now requires databases to manage the distribution of the data. The data should also be resilient to underlying failures of hardware and software.
Multi-core processors: Processor technology is moving to multiple cores. Where erstwhile processors concentrated on the speed, contemporary microprocessors have multiple cores which can process data in parallel. Newer software architectures need to exploit the ability of computing in parallel.
Solid state devices: The era of hard disks seems to be passing. All kinds of rotating disks like CDs and DVDs have given way to pen drives and SD cards. These solid state memory devices have vastly different speeds for reading and writing, compared to the rotating hard disks. The optimisations required for a solid state device differ greatly from a hard disk.