In a conversation with OSFY, Karrtik Iyer, Head of Communities – Data Science & AI, Thoughtworks, champions broader open source adoption and showcases the company’s active engagement with Indian languages in the generative AI space.
Q. What does open source signify for the Thoughtworks team?
A. At Thoughtworks, the concept of open source goes beyond just a development model or a type of software. It is deeply ingrained in our culture and philosophy. We view it as a challenge to break away from traditional solutions and foster a transparent, community-driven process for developing technology.
Thoughtworks has actively embraced open source development; the goal is for the open source project to be supported by the wider community. We have achieved this by creating various open source tools, and upskilling individuals. We train people to contribute to building the software, thus expanding the community. We also have communication groups and engage in knowledge sharing, so people can independently take on tasks. While we provide guidance, we encourage the community to take the lead in development.
In essence, Thoughtworks sees open source as more than just free software; it’s a philosophy that drives innovation, fosters collaboration, and helps create high-quality, secure, and integrated software solutions. We have demonstrated our commitment to this philosophy through various projects and initiatives across different domains.
Q. Many critics believe that open source is more of a marketing gimmick for many firms—either to attract customers or talent. What’s your take?
A. Open source software (OSS) is more than a marketing tactic. In my opinion, it is a versatile and economically viable strategy. Utilising OSS to attract talent and customers reflects the industry’s diversity in approaches. The emergence of various licensing options beyond traditional ones like Apache or MIT signifies OSS’s evolving landscape. Major players like Google and Meta demonstrate OSS’s feasibility, highlighting its strategic value in the tech industry.
Q. What were the goals the Thoughtworks team had planned to achieve via their major participation at Open Source India in October last year?
A. We partnered with Open Source India to deepen our engagement with the open source community, share insights, and learn from other industry leaders and practitioners. The event is a proven platform to contribute to and benefit from the collective knowledge and advancements in open source technologies. We wanted to leverage this platform to share our insights on generative AI and open source technologies with fellow open source enthusiasts from the industry.
Q. How do you calculate the RoI (return on investment) on all efforts put into OSS related projects and community initiatives like Open Source India?
A. Our OSS initiatives are not just a contribution to the tech community but a strategic investment in Thoughtworks’ future. They enhance our brand, attract and retain talent, foster innovation, and open new business avenues — all of which are crucial for our long-term success.
Q. What were the key messages in your talk?
A. The key intent of our talk was to showcase the capabilities of OSS LLMs (large language models) and how enterprises can adopt them for their business use cases. The idea was to convey that one-size-fits all does not work with adoption of generative AI and LLMs. Enterprises need to think strategically about some areas like privacy and latency, etc, and aim to get their business cases solved using multiple small OSS models.
Q. How do you see the evolution of Gen AI and its powerhouses such as ChatGPT, Bard, etc?
A. We are in the second or third wave of AI, moving beyond the hype of ‘LLM for everything’ towards ‘LLM for a few suitable things’. New architectures are emerging. The major players are no surprise. Microsoft has been smart with the investments in OpenAI, and their AI models are becoming more sophisticated by the day. As these evolve, the focus will likely be on improving accuracy, reducing biases, and ensuring the ethical use of AI. Balancing innovation, privacy, and ethical considerations will be crucial in shaping the future of these AI powerhouses.
Q. How can senior decision-makers discover how Gen AI can be beneficial to their business?
A. Senior decision-makers should view Gen AI as a part of their overall AI and data strategy, aligning it with their business objectives. Defining success metrics for envisioned use cases and building evaluation benchmarks early on is essential. Recognising that a single, all-purpose large language model (LLM) may not address all business needs adequately, early deliberation of anticipated UX changes and user training requirements is crucial for reimagining existing products.
Q. Once they have identified the opportunities, how should these decision-makers plan their investments in the next few quarters, given that they have multiple variables to deal with?
A. Start small and explore quantisation techniques to make LLMs work on CPUs if necessary. Deploying GPT-4 for all use cases is not imperative. Look at how multiple smaller LLMs handle cost and complexity. Prioritise early upskilling in Gen AI development skills. Given that NLU (natural language understanding) is an active research area and growing very rapidly, I would suggest that they allocate a subset of their data science talent to acquire new skills in this area.
Q. What aspects should decision-makers be wary of when dealing with Gen AI?
A. They must be mindful of two crucial aspects — first, identifying the business success metric they intend to achieve through the implementation of Gen AI-based offerings; and second, determining the criteria for what ‘good’ is. For instance, in the development of a question-answering system that integrates multiple data sources, it is essential to establish metrics to evaluate its goodness. Defining what ‘good’ looks like and establishing clear evaluation criteria are pivotal in ensuring the success of such endeavours.
Q. Besides the popular SaaS platforms, which traditional business applications are being affected/benefited by Gen AI?
A. Gen AI is impacting various sectors, with significant influence observed in education and the social sector. New patterns are emerging on how to solve a problem, and conversational interfaces are becoming first class citizens. However, health and finance will have their challenges due to the heavy regulation vis-a-vis the probabilistic nature of this technology. Gen AI implementations will have human augmentation in most places with the necessary guard rails.
Another major area of adoption of Gen AI is in the software development life cycle (SDLC). Apart from that, any creative field will have a major impact, be it music, arts, or writing, which is bound to create an interesting discussion on issues such as copyrights. Notably, Thoughtworks is also using Gen AI tech as a brainstorming partner with the necessary guard rails, and most of our customers are looking to adopt it. Gen AI has also led to an acknowledgement about the need for having an AI strategy for the organisation. We have multiple offerings across literacy and education, use case evaluation, productionisation, and scaling.
Q. How do you see the role of open source in enabling growth of Gen AI?
A. The contributions from Meta, Hugging Face, and EleutherAI have been remarkable. Open source plays a crucial role in ensuring the ethical and responsible use of generative AI. Thanks to open source contributions, there is a strong belief that this technology will have a significant outreach to people and industries that may not find it economically feasible to adopt services from OpenAI, among others.
Q. What role does open source play in setting the standard for responsibility and ethics in the context of information dissemination in the case of generative AI tools such as ChatGPT?
A. We are currently in a catch-up game regarding these issues. For example, after the advent of ChatGPT, European countries formed a team to ensure that generative AI is used responsibly within Europe. Along with different types of licensing, more regulation is required by authoritative bodies. In India, we are yet to see significant efforts to regulate generative AI in various Indian languages; the government will need to play a role in framing regulations and laws around this technology. It’s not just about leaving it to the people who built the technology; there needs to be a governing body. Many countries are forming committees for this purpose. In the US, there’s a push for regulations that don’t stifle competition. Open source communities are crucial here, as many enterprises are moving towards open source models, preferring to host their data and models independently rather than sending it to third parties.
The gap in regulation is acknowledged, but it will take time to bridge. Consumers of this technology now have multiple options and can make conscious decisions about which direction to go. Regulatory bodies, both independent and government-formed, are emerging to oversee and regulate these developments in the AI and open source sectors.
Q. Considering the potential unreliability of generative AI, as illustrated by an incident where ChatGPT created a fictitious quote attributed to a non-existent ‘John Doe’, how do we ensure the authenticity and reliability of the information generated by such AI systems?
A. Addressing the reliability of generative AI involves multiple levels and requires a degree of education about the correct use of the technology. The core issue is ensuring that responses are factual. This technology, however, wasn’t designed to inherently distinguish fact from fiction; it’s about predicting the next sequence of words based on its training. This is known as ‘closed book’ question-answering, where the AI responds based solely on its training data, without any external verification.
To improve reliability, one approach is ‘open book’ question-answering. This involves grounding the AI’s responses in factual data. For instance, in an organisational context, you could feed the AI relevant PDFs or documents. When the AI generates a response, it’s restricted to use only the information from these provided documents, ignoring its general training data. This method, known as retrieval-augmented generation (RAG), combines generative capabilities with data retrieval from specific, trusted sources.
However, it’s important to recognise that generative AI is probabilistic, not deterministic. While these methods can reduce the incidence of incorrect or fictional information, they can’t guarantee 100% accuracy. The goal is to increase the likelihood of factual responses, but some level of uncertainty will always be inherent in generative AI systems.
Q. Why is there a lot of buzz around ‘observability’ in recent times, and how does it relate to Gen AI?
A. Observability is crucial for managing cost, latency, and ensuring responsible AI practices. The cost of language model (LLM) usage can grow exponentially, given the per-token charging model. Therefore, there is a need for a systematic approach to track and observe the incurred costs over time, facilitating optimisation strategies. Latency, especially in certain applications, is a significant factor, as diverse prompting techniques can result in LLM calls taking 1-2 minutes if not optimised. Additionally, considering this technology’s primary USP lies in its creative ability, it is essential to acknowledge that responses generated may not always be truly inclusive or unbiased. All these factors emphasise the importance of observability in the context of Gen AI implementation.
Q. With the growing interest in generative AI, particularly among startups, how does ThoughtWorks plan to engage in this area? Do you intend to collaborate with startups to further expand in this field, considering your focus on community development?
A. There are two aspects to consider here. First, for India to lead in generative AI, we need to focus on supporting Indian languages. Currently, platforms like ChatGPT support several languages but not all Indian ones. Academic institutions like the IITs are investing in creating GPT equivalents for Indian languages, and part of our agenda at ThoughtWorks is to collaborate with these academic researchers. We complement each other’s strengths, combining their academic prowess with our capability to bring technology to the common people.
For instance, we’ve worked on projects like EkStep, and developed translation models that are now part of Bhashini, which was showcased at the G20 summit in 2023 and focuses on translating between Indian languages. Collaborating with academic institutions helps us leverage their resources and expertise, essential for building something as complex and resource-intensive as GPT.
Another aspect is forming key partnerships with organisations that share our cultural and ethical values. This alignment is crucial for empowering society through technology. While it’s too early to discuss specific partnerships, our goal is to play a more significant role in bringing generative AI to the broader public community. We’re actively seeking partnerships that align with our mission to use technology for societal benefit, and collaborating with startups is certainly a part of this strategy.
Q. How do you see the level of contribution by Indian techies to open source?
A. Indian techies are doing great. Many start with small ideas and quickly get the necessary funding. One such example is the library Ragas (retrieval augmented generation assessment) by Exploding Gradients. I only wish there are more research contributions from India that we can try out at an early stage for enterprise adoptions.
Q. Do you believe open source will be a crucial factor in making India a technology leader from the grassroots level? How does ThoughtWorks contribute to this vision?
A. I definitely see open source playing a pivotal role in transforming India into a technology leader, especially from the grassroots level. This transformation is already underway. For instance, we are exploring how AI can aid specially-abled people, creating more accessible features in tools like Google Maps, etc.
A noteworthy contribution is the ‘Jugal Bandi’ initiative, presented at the G20 summit. It’s a collaboration with the Indian government to increase awareness of government schemes for farmers. We have developed a bot that communicates in local languages, breaking down barriers in education and technology.
Our ongoing mission is to leverage technology, especially AI, for societal benefit. We aim to empower communities, making technology accessible to the common person and assisting in various aspects of their lives. This approach underlines our commitment to using open source as a tool for societal and technological advancement in India.