On GenAI, Large Language Models, and Vector Databases. Q&A with Ryan Siegler
Q1. Previously, you worked at the Ford Motor Company as an Enterprise Architect. What did you learn there, that is useful to your current job?
As an Enterprise Architect in ML Research, I gained valuable experience in working with large, siloed datasets and leveraging advanced analytics tools like GCP BigQuery, Teradata, and SQL Server. I learned how to build and deploy scalable data pipelines, which allowed me to integrate diverse data sources for time-series analysis and predictive modeling. One key achievement was developing a predictive model with ~80% accuracy for forecasting supplier delivery shortages, which sharpened my skills in time-series analysis—a skill that’s highly relevant to my current role.
Additionally, my work on reference architecture models for large language models (LLMs) and knowledge graphs has been directly applicable to my current job, as I continue to explore ways to structure and manage complex data relationships and insights, especially for RAG pipelines.
Q2. You specialise in developing and leading technical and architectural projects with technologies such as Retrieval Augmented Generation (RAG) and tuning the interplay of LLMs and Vector DBs. What are the main lessons you have learned is far?
In working with RAG and optimizing the interplay between LLMs and Vector Databases, a few key lessons have emerged:
- Data Quality and Preprocessing Matter: The accuracy of retrieval in RAG setups is heavily dependent on the quality of the data indexed in the vector database. Proper data chunking, cleaning, and metadata tagging are essential for relevant and contextually accurate responses.
- Embeddings are Meaningful: In RAG, the quality of the retrieved chunks of data to be sent to the LLM relies heavily on choosing the right embedding model. This means understanding several aspects such as what data the embedding model was trained on and balancing the embedding dimensionality with scalability and speed. Other factors like leveraging dense and sparse embeddings for hybrid search can make a substantial difference in performance and relevance.
- Indexing Strategies Matter: Indexing strategies, on-disk storage options, and optimized retrieval algorithms are all crucial to ensuring that LLMs can provide quick and accurate responses without compromising on scale.
Fittingly, KDB.AI, KX’s vector database, offers plenty of features to enhance retrieval accuracy within RAG pipelines such as multi-index search (perfect for hybrid and multimodal search scenarios), alongside extensive metadata and fuzzy filtering capabilities, and a variety of indexing options including on-disk indexes.
Q3. You are quoted saying that “Multimodal AI stands at the forefront of the next wave of AI advancements.” (*) Why?
Multimodal AI is indeed at the forefront of the next wave of AI advancements because it enables models to process and integrate information from multiple types of data—text, images, audio, and even video—all at once. This holistic approach mimics human perception more closely than single-modal models, providing several transformative benefits:
- Deeper Contextual Understanding: Multimodal AI captures richer context by integrating text, audio, and visuals, enabling better decision-making and accuracy. Just as you might read text, look at an image, and analyze a chart to come to some conclusion, multimodal AI synthesizes these inputs simultaneously, offering insights that a single data type alone might miss.
- Broader Use Cases: It’s ideal for complex fields like healthcare, autonomous vehicles, and entertainment, where multiple data types intersect.
- Enhanced Creativity: Enables innovative content generation—like creating videos from text—and supports more natural, interactive AI experiences. I can foresee this greatly impacting the entertainment and gaming industries.
This human-like data processing makes multimodal AI a critical step forward in realizing AI’s potential.
Q4. GenAI, Large Language Models, and Vector Databases: How do they relate to each other?
To put it broadly, GenAI is more of an umbrella term for artificial intelligence engines that focus on generating content or optimizing tasks. Large Language Models (LLMs) are a subset of GenAI aimed at language-related tasks, generating responses from retrieved content and data based on user inputs. Vector databases are the platforms that power these processes. Essentially, they are critical for supporting LLMs.
GenAI and LLMs rely on massive sets of training data, whereas vector databases store these bits of relevant information in vector form which then can be retrieved by GenAI and LLMs. Vectors can take on many properties (also classified as dimensions), meaning they can hold complex numerical representations (that hold the semantic meaning) of different types of data including text, images, video, and audio.
Vector databases can allow enterprises to use LLMs for querying and retrieval, with common use cases including:
- Image Search: Identify similar images, useful in healthcare diagnosis
- Recommendation Systems: Find relevant interests, products, or content. Think how Netflix recommends new shows and movies for you based on your watch history.
- Pattern Matching: Find similar patterns, analysis & prediction of time-series data like stock market data, sensor data, or weather data.
Q5. High-dimensional embeddings are a challenge for developers working in large-scale AI search, demanding massive storage and computational resources. What is your take on this?
Luckily, I believe that bulky dimensions—those with 1,000 plus—are starting to become a thing of the past. Widely, there has been a fundamental shift in how we approach vector search with a variety of tools and methodologies that are helping developers build effective search systems with low-dimensional embeddings. A lot of credit can be given to Matryoshka Representation Learning (MRL) and its jina-embedding-v3 model. This model paved the way for truncating dimensions without experiencing information loss, ensuring the performance of 1,024 dimensions in just 128. Even OpenAI is supporting truncating abilities with its embedding models. There have been great steps forward here and I hope to see more emphasis on efficiency and scalability in vector embeddings, rather than just focusing on precision.
Q6. Let’s talk about Multimodal RAG for Images and Text. Can the integration of diverse data types like images and text improve how Large Language Models (LLMs) respond to user queries? How?
Bringing images and text together gives LLMs fuller context and understanding of queries, allowing for more holistic responses. For example, when both images and text are considered, LLMs can match visual descriptors with textual keywords, improving relevance and precision. Here’s how it works: initially relevant data is retrieved in response to a user’s query. Then, this data helps the LLM in generating a specific response. Vector databases support the retrieval phase by storing diverse data type embeddings, enabling efficient multimodal data retrieval. This process ensures the LLM can generate responses to a user’s query by accessing a combination of the most relevant text, image, audio, and video data.
Q7. How does combining text, images, and other data types help emulate human-like perception in machines?
Traditional AI often handles just one data type, like text, but the world around us isn’t that simple. In fact, we mostly generate data that isn’t text—especially in the era where social media and virtual conferencing are so prominent. AI must be able to handle mixed data like text, images, video, audio all at once. When these data types are combined, AI can understand context better, providing responses that feel more natural and nuanced. This multi-modal approach lets AI get closer to how humans perceive and interpret the world.
Q8. Let’s talk about KDB.AI. What is the role of a Vector Databases like KDB.AI?
KDB.AI is a vector database that also has time-series capabilities that allows developers to build scalable, reliable, and real-time AI applications. Its role is to serve as a foundational platform for advanced search, recommendation, and personalization for GenAI. We truly see it as a key component of full-stack GenAI applications that use Retrieval Augmented Generation (RAG).
It’s designed to provide users with the ability to combine unstructured vector embedding data—that’s the audio, images and video data we talked about earlier—with structured time-series datasets—numerical values and text—to allow for hybrid use-cases. The most prominent use cases for KDB.AI are scenarios where the rigor of conventional time-series data analytics and the usage patterns provided by vector databases within the GenAI space come together. For example, this serves Capital Markets very well, enabling them to optimize stock levels, predict future buying trends, detect fraudulent transactions or bolster their investment strategies.
Q9. How does KDB.AI integrate with GenAI tools and what are the benefits?
In the context of GenAI, the KDB.AI vector database sits at the core of the RAG pipeline, serving as the data retrieval engine. This means that KDB.AI integrates with all of the common frameworks and technologies used in a RAG workflow. Let’s take a look at the GenAI tools that KDB.AI can be integrated with:
- Unstructured.io: data ingestion and processing framework
- LangChain: Open-source framework for working with LLMs
- LlamaIndex: Open-source framework for LLM applications with a focus on ingestion, indexing, and retrieval
- OpenAI: A leading generative AI model provider
- Hugging Face: Open-Source platform for building, finetuning, and deploying AI models
- Reranker models: optimize retrieval results with built-in integrations to models from Cohere, Jina, and Voyage
- New integrations released on a monthly cadence! Stay updated here.
With these integrations, KDB.AI can fit within most GenAI tech stacks.
Qx. Anything else you wish to add?
There is a conception that vector databases can become quite expensive, especially as datasets scale into millions and billions of vectors. While this is true when using traditional memory-based indexes (memory is expensive!), there is potential for a major reduction in total cost of ownership (TCO) when using on-disk indexes. This reduced TCO has us quite excited about our new on-disk qFlat and qHNSW indexes, I encourage you to give them a try in our free Starter Edition offering!
Resources
(*) Guide to Multimodal RAG for Images and Text
………………………………………………….
Ryan Siegler, Data Scientist, Kx
Ryan is a data scientist and technical thought leader specializing in AI technologies, including retrieval-augmented generation (RAG), large language models (LLMs), and vector databases. He is passionate about harnessing emerging technologies to drive innovation and actively champions their adoption among developers across various industries. Ryan is currently focused on projects involving KDB.AI’s Temporal Similarity Search and on-disk vector indexes, as well as creating self-service generative AI chatbots. Outside of work, Ryan enjoys exploring the outdoors, and playing heavy hitting guitar riffs.
Sponsored by Kx.