On Vector Databases. Q&A with Eric Hanson
Q1. The database market is seeing a proliferation of specialty vector databases. What is a speciality vector database then?
A specialty vector database is a type of database that allows you to store and search high-dimensional vectors, particularly to do nearest-neighbor search. These products usually have some limited ability to filter on other properties called “metadata” but don’t have all the features people expect from a database system.
Q2. When you buy these products and plumb them into your data architecture, what happens?
You experience the typical problems that arise from using such specialty systems, including excessive data movement, redundant data, and disagreement on data values among distributed components. Furthermore, you have to contend with the extra costs associated with purchasing a new database, including licensing fees and labor costs to learn and operate the new software.
Q3. Vectors and vector search cannot be used as a foundation for a new way of processing data. Why?
Vectors are just one data type, and vector search is just one way of processing data. Modern apps will need many data types in addition to vectors, like strings, numbers, datetime, text, spatial, JSON, time series, and more.
Q4. What is the alternative then?
The alternative to using a SVDB is using an existing general-purpose database management system that meets and supports all of a developer’s needs.
Q5. SingleStoreDB offers a vector database subsystem. What is it? How is it different from a specialty vector database?
SingleStore’s vector database subsystem uses SQL to enable high speed nearest-neighbor search to find objects that are semantically similar. Furthermore, because SingleStore uses SQL capabilities, it can offer metadata filtering in more powerful and general forms than SVDB providers can provide. Also, SingleStore provides parallelism, scale out, ACID transactions, high availability, disaster recovery, backup and restore, and point-in-time restore. Consequently, SingleStore’s vector database subsystem gives developers a “best of both worlds’ ‘ experience by allowing developers to access all the benefits of SingleStore’s high performance, cloud-native, modern SQL database, which can power operational analytics and artificial intelligence/machine learning applications.
Q6. SingleStoreDB supports vectors and vector similarity search using dot_product (for cosine similarity) and euclidean_distance functions. What are these database features useful for?
These features are used for applications such as facial recognition, text-based semantic search and photo search. Additionally, these functions are the basis for text-based AI chat bots.
Q7. How complex is it to insert vectors into SingleStoreDB for use with the DOT_PRODUCT() function?
It is very straightforward to insert vectors into SingleStoreDB for use with this function. To do so, you create a table with BLOB-typed columns to store the vectors. By using the JSON_ARRAY_PACK() function, you can insert properly formatted vectors into the table.
Q8. What kind of applications use a nearest-neighbor search? How easily can this be done in SQL in SingleStoreDB?
Applications that use nearest-neighbor search include semantic search of text, facial recognition, product and general object photo recognition, entity resolution, fuzzy matching, chatbots, and more. This type of search can easily be done in SQL in SingleStoreDB by with an ORDER BY/LIMIT query that uses vector similarity functions to get a nearness metric to order by.
Q9. What are vector joins useful for?
SingleStoreDB supports vector joins, which make it possible to conduct set-based nearest-neighbor operations.
Q10. In general, how can SingleStoreDB help organizations that are working with AI and custom-trained vector-embedding models then?
SingleStoreDB can help organizations that are working with such applications by providing them with a system that can perform a multiplicity of operations, including quickly processing and converting semi-structured data into vectors, using indexing capabilities to conduct searches, enhancing results with additional context (and re-sorting them) – and not only transmitting the context to AI APIs for answer generation, but receiving the generated response sharing it with the user, and saving it for later analysis.
……………………………………….
Eric Hanson is a Director of Product Management at SingleStore, responsible for query processing, storage and extensibility feature areas. He joined the SingleStore product management team in 2016.
Resources
– Why Your Vector Database Should Not be a Vector Database
–Try SingleStoreDB Cloud (registration required)
Sponsored by SingleStore