“TiDB supports scaling to millions of queries per second”

0
27

The inverse correlation between the need for data storage and the rate at which data is processed has led to companies searching for database management systems that can keep up with the rapid growth of data. In conversation with OSFY’s Yashasvini Razdan, Bhanu Jamwal, Head of Presales and Solution Engineering, APAC, TiDB, explains how distributed open source databases like TiDB are an alternative for organisations dealing with massive and complex data requirements.

Q. What is the need for another database like TiDB when MySQL and PostgreSQL exist?

A. Traditional databases like MySQL and PostgreSQL were initially designed for smaller datasets and simpler workloads. While they serve well for small-scale operations, they struggle with scalability and performance as data sizes grow or query demands increase. This creates bottlenecks for organisations experiencing rapid growth. For instance, as businesses adopt SaaS (software-as-a-service) models, there is a need to handle not just more data but also higher query volumes and more database tables, all while maintaining performance. Traditional systems often cannot scale horizontally to meet these demands.

TiDB is a distributed database specifically designed to eliminate bottlenecks by offering scalability. Businesses can start small and grow as needed without compromising performance. TiDB supports scaling to millions of queries per second, which is essential for high-traffic applications. It can handle up to one million tables on a single cluster, making it an ideal solution for SaaS companies onboarding multiple tenants.

Q. How does distributed architecture impact scalability and performance?

A. Distributed databases like TiDB ensure that regardless of data size or performance expectations, the system remains consistent. For instance, if your data grows from 1 terabyte to 100 terabytes, TiDB will deliver the same performance levels. This is achieved by scaling out the cluster—adding more computational resources or storage as needed. As a result, there is no performance degradation, ensuring the applications remain unaffected, regardless of growth.

Q. How does TiDB compete with proprietary distributed databases or cloud-native offerings like Google Spanner and AWS Aurora?

A. TiDB is open source, which offers two key advantages — access to ongoing innovation and freedom from vendor lock-in. Organisations can rely on TiDB’s open source nature for continuity and flexibility, regardless of changes in the industry.

It stands out from other distributed databases because of its hybrid transactional and analytical processing (HTAP) capabilities. Unlike many databases optimised solely for online transaction processing (OLTP), TiDB can handle both transactional and analytical workloads on the same cluster. This eliminates the need for ETL (extract, transform, load) processes, enabling real-time analytics. For instance, organisations can detect anomalies or fraud in real time, which is not possible with traditional databases or even some distributed ones. This HTAP capability is critical for companies needing real-time insights without added complexity.

Another differentiator is its compatibility with MySQL, with the ability to run transactional and analytical workloads on the same database, which makes it easy for organisations dependent on MySQL to migrate to TiDB without significant disruption—a simple lift-and-shift process.

AWS Aurora, for instance, is a distributed database but has limitations, such as a single writer node. Google Spanner is robust, but lacks MySQL compatibility, which is essential for many organisations. Microsoft Azure, on the other hand, does not yet have a full-fledged distributed OLTP (online transaction processing) database.

Q. What challenges can a customer face while adopting a distributed database?

A. Adopting a distributed database introduces complexities, particularly in operational management. Customers need advanced operational capabilities to monitor and manage their systems effectively. For instance, tracking CPU consumption, network bandwidth, query loads, and inter-module communication becomes critical. TiDB helps by providing robust tools and a detailed operational framework for monitoring these metrics.

Geo-distributed deployments present another challenge, especially when clusters are spread across different data centres. Ensuring network bandwidth and latency requirements are met is vital. TiDB addresses this by offering high availability and resilience. For example, when deployed on AWS, TiDB can replicate data across three availability zones. Even if an entire zone goes down, the database remains accessible, ensuring zero downtime.

Q. How is the open source nature of TiDB beneficial for its users?

A. Open source projects benefit from global community contributions, which drive innovation and rapid problem-solving. TiDB, as an open source solution, provides flexibility to users—they can try it for free, without any vendor involvement.

Due to the absence of vendor lock-in, TiDB can be deployed on AWS, Google Cloud, Microsoft Azure, or even on-premises, depending on the user’s needs, ensuring that organisations retain complete ownership of their data and database infrastructure.

Q. How do you see the adoption of open source databases globally, and how does it compare to that in India?

A. Open source adoption has been steadily increasing worldwide. Despite being open source, many organisations running mission-critical applications expect enterprise-grade support alongside it, which is understandable. However, the adoption of open source spans regions and industries. Digital-native companies—especially in sectors like e-commerce, logistics, and fintech—often prefer open source solutions due to their scalability and flexibility.

In India, the inclination towards open source is even stronger. Many Indian businesses have extensive IT teams with in-house expertise across various technologies. This makes them more self-reliant and capable of managing open source solutions independently. Open source fits well in such setups, allowing companies to rely on their internal teams instead of seeking external support. Opting for open source provides a sense of long-term security.

Q. Which industry segments are you targeting in India?

A. Our primary focus is on digital-native businesses, including e-commerce, logistics, fintech, gaming, and SaaS companies. These industries experience similar data growth patterns, requiring scalability, real-time analytics, and continuous expansion. Their needs align well with TiDB’s strengths, making them our key target segments.

Q. Many consider open source solutions to be less secure and more vulnerable to attacks compared to proprietary options. How do you address these concerns?

A. Not all open source solutions are alike. They vary greatly depending on the development team and the work invested in the project. For TiDB, security has always been a top priority, especially since our clients work with core banking applications. We have implemented stringent security measures, including strong access controls. We comply with key security certifications such as ISO, PCI DSS (for financial data security), and HIPAA (for healthcare data security). Regular audits and certifications give us and our customers confidence in TiDB’s security.

Q. In banking and credit card applications, where simultaneous transactions generate vast amounts of data, how does TiDB ensure continuity of data flow across distributed systems?

A. TiDB’s distributed architecture is designed to ensure high availability, even in cases of infrastructure failure. Let me explain with a simplified example. If I have a record for my name, Bhanu, TiDB will store three copies of that record across different availability zones (AZs) within AWS. Even if one entire AZ goes down, the other two copies remain accessible, ensuring uninterrupted service.

This architecture allows TiDB to withstand infrastructure failures, whether on-premises or in the cloud. Even if a server or network experiences downtime, TiDB ensures data availability and continuity for end users. High availability is a cornerstone of TiDB’s design, providing resilience and reliability in all scenarios.

Q. Is TiDB compatible with multi-cloud, hybrid cloud, and edge environments? If yes, how does it achieve this?

A. TiDB’s open source nature ensures compatibility across a variety of environments. You can download and deploy it on-premises, whether on physical machines, virtual machines, or using Kubernetes on containers. For users of AWS or Google Cloud, TiDB can be deployed on their underlying virtual machines (VMs) such as EC2 instances.

We offer managed services for cloud deployments. With TiDB Dedicated, users only need to point their applications to the service; we handle scalability, data maintenance, version upgrades, patches, backups, and recovery. Another option is TiDB Serverless, a pay-as-you-go model, which automatically scales up and down as per demand.

Q. How has TiDB adjusted its offerings for customers using AI and LLMs?

A. We are seeing a common pattern where most AI companies are leveraging vector databases. Although vector storage has existed for decades, its adoption has recently surged, especially with large language models (LLMs) and AI advancements.

Scalability is critical here, as vector storage grows alongside data and model accuracy requirements. TiDB ensures this scale without compromising performance, by integrating vector capabilities into the TiDB cluster. So TiDB can now support transactional workloads (OLTP), real-time analytical workloads, and vectorised data storage for AI models—all on a single cluster.

Currently, TiDB Vector Storage is available on TiDB Serverless and will be extended to on-premises and TiDB Cloud by the end of the year.

Q. What role does the community play in TiDB’s offerings?

A. The open source community is instrumental in TiDB’s growth. It boosts awareness, adoption, and innovation through shared experiences and feedback. We regularly conduct hackathons and receive creative contributions from the community. For example, someone once proposed implementing graph database capabilities on TiDB—a completely different use case that showcased the community’s ingenuity.

In India, we see immense potential due to the availability of exceptional talent. We are actively investing in community-building efforts, hosting events, and planning initiatives like the TiDB User Day this year to further engage with the open source community.

Q. What governance model do you follow for community contributions? How do you manage external inputs?

A. Open source contributors can propose ideas, which are reviewed by our product team. This team evaluates the feature’s value, prioritises it against our roadmap, and ensures alignment with TiDB’s principles. Approved contributions go through rigorous testing across all supported versions to maintain quality. A centralised governance team acts as a gatekeeper to oversee this process and ensure compliance.

Q. Do you have a dedicated community management team, and is there one specific to India?

A. Yes, we have a dedicated global team for community management, including a local community manager in India. Given the talent pool here, we are expanding our efforts in India, hosting events and engaging with the open source community to foster collaboration and innovation.

Q. Are you involved in other open source projects?

A. One project is OSSInsights.io, an open source platform we developed initially to showcase TiDB’s scalability by storing over 6 billion records in a single table. However, it quickly evolved as a tool for analysing trends in GitHub data, such as programming language popularity and database usage.

Q. Apart from hackathons and OSSInsights.io, what other ways do you engage with the community?

A. We have several engagement channels, including a Slack community for open source contributors. Recently, we introduced initiatives like offering free credits to contributors with innovative ideas for using TiDB. We also run contests to recognise the best TiDB use cases and stories, rewarding participants for their contributions. These activities are just the beginning; we aim to enhance our community engagement significantly in the coming years.

Q. How do you see distributed databases evolving and what can the community expect from TiDB in the next five years?

A. Distributed databases are poised for widespread adoption as businesses grow and demand scalable, efficient solutions. By 2030, the expected 10-20x growth in digital-native businesses, AI, and SaaS will make distributed databases a necessity rather than a choice.

The evolution of SQL-based distributed databases is a natural progression from traditional SQL and NoSQL databases. These modern databases combine SQL’s familiarity with the scalability of NoSQL, ensuring they are future-proof for the growing demands of data-intensive industries.

For the open source community, this represents a tremendous opportunity for learning and innovation. TiDB is committed to being part of this journey, providing scalable, innovative solutions while remaining open to ideas and contributions from the community. Together, we can shape the future of distributed databases.