Bringing AI to the Data: Emma McGrattan on Vector Databases, Edge Deployment, and the Future of Production AI
Q1. Most vector databases were architected from the ground up for cloud-first, always-connected environments. As the author of Vector Databases for Enterprise AI and the CTO behind VectorAI DB, how do you explain the fundamental architectural gap that cloud-native vector databases leave for organizations operating in regulated, disconnected, or edge environments and why has that gap taken this long to be addressed seriously?
Emma McGrattan: The honest answer to the second part of your question is that the gap wasn’t taken seriously because the people building vector databases weren’t the ones hitting the wall. The first wave of vector database development was driven by cloud-native AI teams (researchers, startups, hyperscaler labs) working in environments where data could move freely and connectivity was assumed. When that group builds infrastructure, they build for their own constraints. Regulated industries, disconnected operations, and edge deployments weren’t their problem.
The architectural gap is real and it runs deep. Cloud-native vector databases are built around the assumption of elastic compute, persistent network connectivity, and centralized index management. Many cloud-oriented vector database architectures assume abundant memory, elastic compute, and reliable coordination across services or nodes. Some use HNSW or related ANN structures in ways that work well in connected, centrally managed environments, but become harder to operate when memory is constrained, connectivity is intermittent, or deployment must be fully air-gapped. Embedding pipelines assume round-trip access to model endpoints. Filtering and metadata operations often assume that vector indexes, metadata stores, identity services, and policy engines can remain continuously reachable within the same cloud environment. None of that holds at the edge, and most of it creates compliance exposure in regulated environments.
What organizations in healthcare, financial services, and government actually face is a data gravity problem. Their most sensitive, highest-value data can’t move, not because they don’t want to use AI, but because regulation, sovereignty requirements, or simple network physics prevent it. GDPR, HIPAA, sector-specific data residency mandates: these aren’t theoretical constraints. They’re blocking production deployments right now. The standard advice ‘move your data to the cloud and run your AI workloads there’ is not an option for a hospital trust with patient records, a central bank with transaction data, or a defense contractor with classified information.
Why has this taken so long? Partly because regulated industries move slowly and the market signal took time to reach product teams. Partly because edge AI hardware only recently reached the capability threshold where meaningful vector workloads are feasible on-premises. And partly because the open-source vector database ecosystem , while excellent for cloud deployments, simply doesn’t optimize for the constraints that matter at the edge: constrained memory footprints, intermittent connectivity, hardware-specific performance tuning, predictable local latency, and the ability to run entirely air-gapped.
Actian VectorAI DB was built specifically for this gap. The architecture starts from different assumptions: deployment is heterogeneous, connectivity is unreliable, data must stay where it lives, and compliance is not an afterthought bolted on at the end. That changes almost every design decision, from index structure and memory management to how governance metadata is tracked and how queries are routed in a distributed environment.
Q2. The benchmark results are striking 22x faster throughput than leading open-source alternatives on identical hardware, retaining 72% of throughput when scaling from 1 million to 10 million vectors while competitors dropped to around 12%. Can you walk us through what is happening architecturally inside VectorAI DB that produces that kind of performance differential, and what trade-offs, if any, organizations should be aware of when evaluating these numbers?
Emma McGrattan: I’ll start with the trade-offs, because that is the most important part of the conversation. Benchmarks are always context dependent. They reflect specific hardware, dataset characteristics, query patterns, and tuning choices. The results we published are reproducible in the environments we tested, but any organization evaluating vector databases should validate performance against their own workloads before making architectural decisions.
That said, the performance differences we are seeing are driven by a couple of architectural choices rather than a single optimization.
First is how index construction and maintenance are handled. As vector datasets grow, many systems experience increasing overhead from balancing ingestion, graph quality , and query performance. Our approach is optimized for incremental index construction and maintenance, with the goal of sustaining throughput as data volumes increase without introducing disruptive rebuilding or compaction patterns. This is a key factor in why performance degrades more gradually as you move from one million to ten million vectors.
Second is memory layout and data locality. Vector search is fundamentally a memory access problem. The efficiency of cache usage, memory alignment, and NUMA awareness has a direct impact on latency and throughput, particularly on multi-socket systems that are common in on-premises deployments. In environments where you control the hardware, you can tune for these characteristics much more aggressively.
There are a few important caveats to keep in mind. The 22x figure reflects high throughput concurrent workloads on specific hardware configurations . It is not a single query latency comparison. Recall and relevance matter as much as raw throughput, so performance gains need to be evaluated alongside retrieval quality. Finally, different use cases such as RAG, recommendation systems, and real-time similarity matching stress the system in different ways. The right evaluation is always workload specific, not benchmark specific.
Q3. The phrase ‘bring AI to the data instead of moving data to AI’ captures a significant shift in how organizations should think about AI architecture. In practice, what does that mean for a data or infrastructure leader who has already invested heavily in a cloud-centric data strategy and how difficult is it realistically to retrofit data sovereignty and governance into AI systems that weren’t designed with those constraints in mind from the start?
Emma McGrattan: The phrase captures something that I think most data leaders already know intuitively but haven’t fully operationalized yet: data gravity is real, and AI doesn’t override it. You can’t always move the data. So at some point, the compute has to go to where the data is.
For leaders who have invested heavily in cloud-centric data strategies, I want to be clear about something: that investment isn’t wasted. The cloud is the right answer for a large portion of enterprise data and AI workloads. What’s changing is the assumption that it’s the right answer for all of them. The practical shift is recognizing that your AI architecture needs to match your data topology and for most large organizations, that topology is hybrid by necessity, not by choice.
In practice, ‘bring AI to the data’ means deploying your retrieval infrastructure, specifically your vector search layer, closer to where sensitive or latency-critical data lives, rather than replicating that data into a centralized AI environment. It means running vector search on-premises for data that can’t leave your network, at the edge for real-time applications, and in the cloud for data where those constraints don’t apply. In some architectures, the model can still live in the cloud, provided the retrieved context is governed, minimized, and allowed to cross that boundary. In stricter sovereignty, latency, or disconnected scenarios, both retrieval and inference may need to run locally. What’s changing is where context is retrieved and assembled before the model sees it.
On the harder question, retrofitting sovereignty and governance into systems not designed for it, the blunt answer is that it’s expensive, slow, and often incomplete. Governance that was bolted on after the fact almost always has gaps. If your embedding pipeline was built without access controls on which documents could be embedded and retrieved, you have invisible authorization gaps at the retrieval layer. If your vector indices were created without provenance tracking, you can’t answer a regulator’s question about which source data informed a given AI output. These aren’t configuration problems. They’re architectural ones.
The pragmatic path forward isn’t to throw away existing infrastructure. It’s to introduce governance at the retrieval layer, which is often the most overlooked gap, and to stop treating the vector database as a performance component and start treating it as a compliance boundary. Access control, audit logging, data lineage, and embedding provenance need to be first-class features of your vector infrastructure, not optional extras. Organizations that get this right early will have a significant operational advantage as regulatory scrutiny of AI systems intensifies.
Q4. Regulated industries healthcare, financial services, manufacturing, government are often the slowest to adopt new database technologies precisely because the compliance stakes are so high. What does it actually take for a new vector database to earn the trust of a CISO or a compliance officer in those environments, and how should organizations evaluate the security and governance claims of any vector database they are considering for production AI?
Emma McGrattan: CISOs and compliance officers are not trying to block AI adoption. They’re trying to answer a specific set of questions that the technology vendors often can’t answer clearly: Where does the data go? Who can access it? Can we audit what happened? Can we demonstrate compliance to a regulator? If you can answer those four questions concretely, the conversation changes.
Earning trust in regulated environments is fundamentally a documentation and controls problem, not a feature problem. Technical capability matters, but what a compliance officer needs to see is evidence: penetration test results, security architecture documentation, data flow diagrams that show precisely where embeddings are generated, stored, and queried, and an audit log that creates a traceable chain from AI output back to source data. Many vector databases can’t produce that chain, which means they can’t be used in environments where explainability and auditability are regulatory requirements.
There’s a specific risk that I raise in almost every conversation with regulated-industry buyers that most vendors don’t talk about: embedding leakage. When you embed documents into a vector database, you’re not storing the original texttext, you’re storing a mathematical representation of it. But those representations should not be treated as harmless or fully opaque. Research has shown that embeddings can leak information and, under some conditions, can be partially inverted or used to infer sensitive characteristics of the source text. If your embedding index is accessible to parties who shouldn’t have access to the underlying documents, you have a data exposure risk that isn’t captured by traditional access controls on the document store. A compliance officer who understands this will ask whether the vector database enforces access controls at the index level, not just at the query interface.
For organizations evaluating security and governance claims, I’d suggest a practical checklist. Does the database support role-based and attribute-based access controls? Does it produce tamper-evident audit logs that capture query identity, policy decisions, index versions, source references, and enough result metadata to support investigation without unnecessarily logging sensitive content? Can it operate fully air-gapped without any telemetry or licensing calls to external endpoints ? Does it support encryption at rest and in transit with customer-managed keys? And critically, can the vendor provide a data processing agreement and architectural documentation sufficient for a regulatory review?
The last point is often the deciding factor. A database that performs well in benchmarks but can’t produce compliance documentation won’t get deployed in a hospital network. The trust gap isn’t technical. It’s procedural, and vendors who treat compliance as a sales obstacle rather than a design requirement will keep losing deals in regulated markets.
In full transparency, Actian VectorAI DB 1.0 does not yet have all of those security and governance capabilities on the checklist, but our plan is to close the gap in the near term.
Q5. Gartner predicts that 33% of enterprise software applications will include agentic AI by 2028. As AI systems evolve from RAG and retrieval toward autonomous agents and robotics, how do the requirements for the underlying vector database change and what should organizations be designing for today in their data infrastructure to avoid having to re-architect everything as agentic AI matures?
Emma McGrattan: The Gartner prediction is plausible, and the infrastructure implications are underappreciated. RAG is a relatively well-understood pattern: a query comes in, you retrieve relevant context, you pass it to a model, you get a response. The retrieval layer is in the critical path once per interaction, the query is human-generated, and the stakes of any individual retrieval failure are bounded.
Agentic AI changes all three of those assumptions. Agents make multiple retrieval calls per task, often in sequence, where the output of one retrieval informs the next query. The queries are generated by the agent itself, not by a human, which means they’re less predictable and often more precise. Agents tend to generate more targeted, compositional, and context-dependent semantic queries than humans do. And the stakes of retrieval failures compound: an error in step three of a twelve-step agentic workflow can produce a confidently wrong final output that’s much harder to detect than a single bad RAG response.
For robotics and real-time autonomous systems, the requirements shift further. Latency requirements that are acceptable for a knowledge assistant, say 50 to 100 milliseconds for vector retrieval, become unacceptable for a robot making navigation decisions or an industrial control system responding to sensor data. For those applications, retrieval latency may need to move from the tens-of-milliseconds range toward single-digit milliseconds, depending on the control loop, safety envelope, and whether vector retrieval is being used in the real-time decision path or in a supporting planning layer.
What should organizations be designing for today? A few things that will matter more as agentic workloads scale. First, retrieval observability. When an agent produces a bad output, you need to be able to trace which retrieval steps contributed to it. That requires structured logging at the vector search layer: not just query inputs and outputs, but index version identifiers, embedding model versions, filter behavior, result scores, source references, and offline evaluation metrics that help teams understand recall and retrieval quality over time. Without that, debugging agentic failures is essentially impossible at scale.
Second, multi-index architecture. Different retrieval tasks within a single agentic workflow may require different index configurations with a long-term memory index needing different recall characteristics than a working memory index for the current session context. Organizations that design for a single flat vector index will hit scaling and cost problems as agentic workflows grow more complex.
Third, and this is the one most organizations are sleeping on: embedding model governance. When you update your embedding model, and you will because the models improve rapidly, your existing vectors are no longer comparable to newly generated ones. In a RAG system, this creates a subtle quality degradation. In an agentic system that relies on memory across sessions, it can break agent continuity entirely. You need a versioning and re-indexing strategy before you need it, not after.
The organizations that avoid a re-architecture moment are the ones treating the vector database as load-bearing infrastructure from the start, not a prototype component to be replaced later, but a production system with the same operational rigor they’d apply to a relational database. That means governance built in, observability from day one, and deployment flexibility designed for where their data will need to live two years from now, not just where it lives today.
…………………………………………………

Emma McGrattan | CTO, Actian | Author, Vector Databases for Enterprise AI
As Chief Technology Officer at Actian, I get to lead our technology strategy, innovation, and product development, all in service of a mission I really care about: making it easier for companies to connect, manage, govern, and analyze their data.