On Vector databases and GenAI. Q&A with Michael Gilfix
Q1. Do you believe GenAI is ready to be used in daily business? Why?
In some ways, GenAI is already being used in daily business operations, but only for surface level tasks that contribute to worker productivity. Think of the number of people who talk about using ChatGPT to write content or support research. While the adoption rate has soared over the last 18 months or so, the use in business operations is still nascent.
What we’re seeing people do with these tools is just a fraction of GenAI’s potential and implementing it in business functions requires tackling technical and business level inhibitors such as compliance, security, advanced skill sets, and more. These inhibitors will need to be addressed in a comprehensive but also efficient way before GenAI is ready for core business processes.
Q2. In our conversation back in September 2023 (*) you mentioned that “most firms are focused on the basic AI use cases of taking their data, indexing it, making it accessible to generative AI engines. But this doesn’t solve a lot of the problems for how people want to use the technology.” What is your point of view today?
I believe that statement is still true. A lot of people still must move from exploration into production. Some of those barriers are technical. And some are business and organizational, like how do we find the right productivity benefit for that person? Or how do we ensure that our data is secured? More frequently, we hear, how do we know if the recommendations that come from the AI application are correct? These barriers are very real, and coupled with a lack of proven adoption processes that businesses are sharing, understanding the real business impact has been stalled for some organizations.
But I am still optimistic that the move to mass production is happening and will continue to grow. I look to the sheer amount of interest I’ve seen in organizations to build retrieval augmented generation (RAG) architecture as proof. We’ve seen quick paths from exploration to implementation in these areas. Once organizations begin to see more ways of applying GenAI technology into areas of their business that provide significant benefits and unlock ROI, I am confident we’ll see more and more willingness to move faster towards adoption.
Q3. Vector databases have been touted as a key part of GenAI infrastructure. Are vector databases overhyped? Do we really need a specialized vector database?
Vector databases are becoming increasingly essential. As enterprises get serious about using unstructured data, their data volumes will go up dramatically. So, having technology that’s designed for coping with those volumes in a cost-effective manner is going to be the key to unlocking effective total-cost-of-ownership.
The discussion around whether specialized vector databases are needed is really code for “performance matters.” People look to native vector databases as a means of having confidence that they are getting best of breed performance. However, to get the most out of their data estate, organizations will want to use the entirety of their estate, which means combining both structured and unstructured data to ensure that their AI has access to the most complete data set. We at KX have been delivering vector native performance while giving organizations the ability to leverage both types of data in a single database.
Q4. In November of 2023 you launched KDB.AI, a vector database that can take public and private enterprise data and enable large language models and AI generators to search and reason about that data. What is new with KDB.AI?
We have quite a few new things with KDB.AI!
Firstly, with KDB.AI’s hybrid search, you can find the right data for your AI that spans both structured and unstructured data. We combine different ways of finding structured data, from exact match to literal search to fuzzy search, with similarity search on unstructured data to find the most relevant and accurate answer.
Secondly, we’ve introduced Temporal Similarity Search (TSS). This capability is significant when you are analyzing patterns, trends, and anomalies in time series datasets. We’ve introduced two variants that can greatly optimize search for your application. Transformed TSS uses a patent-pending compression model to reduce time-series windows by over 99%, compressing data points while maintaining the original data’s shape. And Non-Transformed TSS is an algorithm for near real-time similarity search with extreme memory efficiency across fast-moving time-series data. It analyzes patterns and trends without needing to embed, extract, or store vectors in the database.
The third thing we’ve launched is an on-disk index, meaning you can have databases that are very, very large but still have the great performance that kdb+, our core database technology, is known for but applied to vector databases. Our pre-filtering technology means that you can search sparse metadata as well as vector embeddings with great performance. We’ve also created our own proprietary search algorithms so that users can truly exploit the benefit of the kdb+ engine that powers KDB.AI, bringing unparalleled performance and scalability to our users.
Q5. Can you tell us what are the most innovative use cases being developed using KDB.AI?
We have three primary use case patterns, which include application search, multi-modal RAG systems, and behavioral analytics, which looks for patterns and data. I’ll use the capital markets space to provide specific context of how these can be applied.
Deep data analysis is painstaking for any human to perform manually. When it comes to trading analytics, an analyst or Quant researcher could request a summary of what’s being said in the market about stocks that have experienced a blip in growth followed by a rapid decline. They can then request the market prices during that period and compare similar stocks that went through similar blips. That’s an augmented use case, using RAG and advanced search capabilities. Once they have this data, they can conduct more analysis to help forecast market movement, such as the impact on stock prices when there was a natural disaster. This requires robust quantitative analysis on behaviors looking for patterns and anomalies.
Q6. What are the main lessons learned so far in using KDB.AI?
The most common theme we hear are individuals using vector databases and GenAI struggling to move beyond an exploratory phase and effectively put something new into practice while achieving a return on investment. We talked about some of these barriers earlier.
With that in mind, I’d say the first main lesson learned is accuracy and making sure that a company’s AI system produces correct answers. You need to be able to trust it to effectively use it.
The second lesson learned is that your AI system needs to work for your domain and be able to take advantage of all your data, which would extend to both structured and unstructured datasets. This is critical because without both datasets, AI’s ability to produce accurate and trustworthy results is significantly hindered. With this covered, you can come to a place where you can cost effectively scale the AI system with confidence that it will deliver real ROI.
Q7. How does KX stand out in a saturated AI market?
At KX, we believe in developing AI that matters. We’re not working on AI for the sake of AI, but instead we’re creating tools that help businesses achieve a sustainable, cutting-edge advantage. Whether you’re in capital markets seeking better trading strategies than your competitors, or in aerospace and defense aiming to outthink adversaries, our plan is to help organizations excel in what matters most – informed decision making.
To achieve this, organizations need great AI, which must harness their data to develop a complete and up-to-date worldview. KX is at the forefront of this, handling vast amounts of data that traditional data analytics systems can’t fully comprehend, like blending historical data with real-time business insights. Our unique ability to govern data is what defines our approach at KX, and we plan to continue enabling businesses to leverage data for the most critical aspects of their business.
Resources
(*) On KX Core Technology. Q&A with Michael Gilfix. ODBMS.org, SEPTEMBER 15, 2023
………………………………………………..
Michael Gilfix, Chief Product & Engineering Officer, KX
Michael is an experienced software business executive with a strong track record of driving growth and building scalable global software product businesses for the enterprise market. At KX, his focus is on driving a product-led software strategy, accelerating market growth by democratizing access to KX’s market-leading technology, principally the KDB.AI vector database.
With over 20 years of experience in software development Michael held a number senior positions at IBM, leading teams in product management and engineering for data, AI, integration, and automation technologies. Most recently, he served as an advisor to the leadership team at Domo Inc, a cloud business intelligence company, on product and business strategy.
Sponsored by KX.