On the Future of AI. Interview with Raj Verma
“ Five years from now, today’s AI systems will look archaic to us. In the same way that computers of the 60s look archaic to us today. What will happen with AI is that it will scale and therefore become simpler, and more intuitive. And if you think about it, scaling AI is the best way to make it more democratic, more accessible.“
Q1. What are the innovations that most surprised you in 2023?
Raj Verma: Generative AI is definitely the talk of the town right now. 2023 marked its breakthrough, and I think the hype around it is well founded. Few people knew what generative AI was before 2023. Now everyone’s talking about it and using it. So I was quite impressed by the takeup of this new technology.
But if we go deeper, we have to acknowledge that the rise of AI would not have been possible without significant advancements in how large amounts of data are stored and handled. Data is the core of AI and what is used to train LLMs. Without data, AI is useless. To have powerful generative AI that gives you answers, predictions and content right at the moment you need it, you need real-time data, or data that is fresh, in motion and delivered in a matter of milliseconds. The interpretation and categorization of data are therefore crucial in powering LLMs and AI systems.
In that sense, you will notice a lot of hype around Specialized Vector Databases (SVDB), which are independent systems that you plug into your data architecture designed to store, index and retrieve vectors, or multidimensional data points. These are popular because LLMs are increasingly relying on vector data. Think of vectors as an image or a text converted into a stored data point. When you prompt an AI system, it will look for similarities in those stored data points, or vectors, to give you an answer. So vectors are really important for AI systems and businesses often believe that a database focused on just storing and processing vector data is essential for AI systems.
However, you don’t really need SVDBs to power your AI applications. In fact, loads of companies have come to regret their use because, as an independent system, they result in redundant data, excessive data movement, increasing labor and licensing costs and limited query power.
The solution is to store all your data — structured data, semi-structured data based on JSON, time-series, full-text, spatial, key-value and vector data — in one database. And within this system have a powerful vector database functionality that you can leverage to conduct vector similarity search.
All this to say that, I’ve been impressed at the speed in which we are developing ways to power generative AI. We’re experimenting based on its needs and quickly figuring out what works and doesn’t work.
Q2. What is real-time data and why is it essential for AI?
Raj Verma: Real time is about what we experience in the now. It is access to the information you need, at the moment you need it, delivered together with the exact context you need to make the best decision. To experience this now, you need real-time data — data that is fresh and in motion. And with AI, the need for real-time data — fast, updated and accurate data — is becoming more apparent. Because without data, AI is useless. And when AI models are trained on outdated or stale data, you get things like AI bias or hallucinations. So, in order to have AI that is powerful, and that can really help us make better choices, we need real time data.
With the use of generative AI expanding beyond the tech industry, the need for real-time data is more urgent than ever. This is why it is important to have databases that can handle storage, access and contextualization of information. At SingleStore, our vision is that databases should support both transactional (OLTP) and analytical (OLAP) workloads, so that you can transact without moving data and put it in the right context — all of which can be delivered in millisecond response times.
Q3. One of the biggest concerns around AI is bias, the idea that existing prejudices in the data used to train AI might creep into its decisions, content and predictions. What can we do to mitigate this risk?
Raj Verma: I believe humans should always be involved in the training process. With AI, we must be both student and teacher, allowing it to learn from us, and in that way continuously give it input so that it can give us the insight we need. There are many laudable efforts to develop Hybrid Human AI models, which basically incorporate human insight with machine learning. Examples of hybrid AI include systems in which humans monitor AI processes through auditing or verification. Hybrid models can help businesses in several ways. For example, while AI can analyze consumer data and preferences, humans can jump in to guide how it uses that insight to create relevant and engaging content.
As developers, we must also be very cognizant of where the data used to train LLMs comes from. And in this sense, being transparent about where it comes from helps, because the systems can be held accountable and challenged if biased data does creep into the training process. The important thing here is also to know that an AI system is only as good as the data that is trained on.
Q4. The popularity and accessibility of generative artificial intelligence (gen AI) has made it feel like the future we see in science fiction movies is finally at our doorstep. And those science fiction movies have sowed much worry about AI being dangerous. Is this Science fiction vision of AI becoming true?
Raj Verma: Don’t expect machines to take over the world, at least not any time soon. AI can process and analyze large amounts of data and generate content based on that, at a much faster pace than we humans can. But they are still very dependent on human input. The idea that human-like robots will come to rule the world makes for great fiction movies, but it’s far from becoming a reality.
That doesn’t mean that AI isn’t dangerous — and we have a responsibility to discern discerning which threats are real.
AI poses an unprecedented risk in fueling the spread of disinformation because it has the capacity to create authentic looking content. Distinguishing between content generated by AI and that created by humans will become increasingly challenging. AI can also pose cybersecurity threats. You can trick ChatGPT into writing malicious code, or use other generative AI systems to enhance ransomware. And AI can worsen current malicious trends that have surfaced with social media. I personally worry that AI systems will exploit the attention economy and spur higher levels of social media addiction. This can have terrible consequences on teenagers’ mental health. As a father of two, I am deeply concerned about this.
These are the threats that we should worry about. And we humans are capable of mitigating these risks. We should always be involved in AI’s development, audit it and pay special attention to the data that we use to train it.
Q5. You are quoted saying that ” without data, AI wouldn’t exist—but with bad or incorrect data, it can be dangerous.” How dangerous can AI be?
Raj Verma: Generative AI is like a superhuman who reads an entire library of thousands of books to answer your question, all in a matter of seconds. If it doesn’t have access to that library, and if that library doesn’t have the latest books, magazines and newspapers, then it cannot give you the most relevant information you need to make the best decision possible. This is a very simple explanation of why, without data, AI is useless. Now imagine that library is full of outdated books that were written by white supremacists during the civil war. The information you are going to get from this AI system is going to guide your decisions, and you are going to make some very bad decisions. You are going to make biased decisions, and you’re going to perpetuate biases that already exist in society. That’s how AI can be dangerous, and that is why we need AI systems to have access to the most updated, accurate data out there.
Q6. Should AI be Regulated? And if yes, what kind of regulation?
Raj Verma: The issue is, it’s hard to regulate something that is still developing. We just don’t know what AI will look like, in its entirety, in the future. So we want to avoid regulation hampering the development of this technology. That doesn’t mean that there aren’t standards that can be applied globally. Data regulation is key, since data is the backbone of AI. Data regulation can be based on the principle of transparency, which is key to generate trust in AI and our ability to hold this technology and its developers accountable should something go wrong. To achieve transparency you need to know where the data in the AI system is coming from. So, proper documentation of the data used to train LLMs is something we can regulate. You also must be able to explain the reasoning behind an AI system’s solutions or decisions. These must be understandable by humans. And there’s also transparency in how you present the AI system to users. Do users know that they are talking to an AI robot and not a human? We can regulate data transparency without imposing excessive measures that could hamper AI’s development.
Q7. There is no global approach on AI regulation. Several Countries in the world are in various stages of evolving their approach to regulating AI. What are the practical consequences of this?
Raj Verma: A global scale regulation of AI is incredibly challenging. Each country’s social values will be reflected in the way they approach regulating this new technology. The EU has a very strong approach to consumer protection and privacy, which is probably why it authored the first significant widespread attempt to regulate AI in the world. I don’t believe we will see such a wide sweeping legislation in the US, a country that values innovation and market dynamics. The US, we will see a decentralized approach to regulation, with maybe some specific decrees that seek to regulate its use in specific industries, like healthcare or finance.
Many worry that the EUs new AI act will become another poster child of the Brussels effect, where firms end up adopting the EU’s regulation, in absence of any other, because it saves costs. Yet the Brussels effect might not exactly happen with the AI act, particularly because firms might want to use different algorithms in the first place. For example, marketing companies will want to use different algorithms for different geographic areas because consumers behave differently depending on where they live. It won’t be hard then for firms to have their different algorithms comply with different rules in different regions.
All this to say that we should expect different AI regimes around the world. Companies should prepare for that. AI trade friction with Europe is likely to emerge, and private companies will advance their own “responsible AI” initiatives as they face a fragmented global AI regulatory landscape.
Q8. How can we improve the way we gather data to feed LLMs?
Raj Verma: We need to make sure LLMs are up to date. Open source LLMs that are trained on large, publicly available data are prone to hallucinate because at least part of their data is outdated and probably biased. There are ways to fix this problem, including Retrieval Augmented Generation (RAG), which is a technique that uses a program to retrieve contextual information from outside the model, immediately feeding it to the AI system. Think of it as an open book test where the AI model, with the help of a program (the book), can look up information specific to the question it is being asked about. This is a very cost effective way of updating LLMs because you don’t need to retrain it all the time and can use it in case-specific prompts.
RAG is central to how we at SingleStore are bringing LLMs to date. To curate data in real time, it needs to be stored as vectors, which SingleStore allows users to do. That way you can join all kinds of data and deliver the specific data you need in a matter of milliseconds.
Q9. What is the evolutionary path you think AI will go through? When we look back 5-10 years from now, how will we look at genAI systems like ChatGPT?
Raj Verma: Five years from now, today’s AI systems will look archaic to us. In the same way that computers of the 60s look archaic to us today. What will happen with AI is that it will scale and therefore become simpler, and more intuitive. And if you think about it, scaling AI is the best way to make it more democratic, more accessible. That is the challenge we have in front of us, scaling AI, so that it works seamlessly in giving us the exact insight we need to improve our choices. I believe this scaling process should revolve around information, context and choice, what I call the trinity of intelligence. These are the three tenets that differentiate AI from previous groundbreaking technologies. They are also what help us experience the now in a way that we are empowered to make the best choices. Because this is our vision at SingleStore, we focus on developing a multi-generational platform which you can use to transact and reason with data in millisecond response times. We believe this is the way to make AI more powerful because with more precise databases that can deliver information in real time, we can power the AI systems that will really help us make the best choices as humans.
Raj Verma is the CEO of SingleStore.
He brings more than 25 years of global experience in enterprise software and operating at scale. Raj was instrumental in the growth of TIBCO software to over $1 billion in revenue, serving as CMO, EVP Global Sales, and COO. He was also formerly COO at Apttus Software and Hortonworks. Raj earned his bachelor’s degree in Computer Science from BMS College of Engineering in Bangalore, India.