Vector Databases
A guide to vector databases, the specialized storage systems designed to store, index, and query high-dimensional vector embeddings, forming the retrieval backbone for generative AI and semantic search applications.
Searching by Meaning, Not Keywords
Traditional relational databases are designed for exact matches (WHERE customer_id = 123) or keyword searches (WHERE description LIKE '%laptop%'). If a user searches an e-commerce database for “comfortable running shoes,” a keyword search will completely miss a product described as “cushioned jogging sneakers” because there is no exact text match, despite the products being semantically identical.
Vector databases solve this by searching by meaning rather than keywords. They are designed to store and query vector embeddings: arrays of floating-point numbers (often hundreds or thousands of dimensions long) generated by machine learning models that capture the semantic meaning of text, images, or audio. In a vector space, the embedding for “running shoes” and “jogging sneakers” will be positioned very close to each other.
When a user searches for “comfortable running shoes,” the search query is converted into a vector by the same ML model. The vector database then performs a Nearest Neighbor (NN) or Approximate Nearest Neighbor (ANN) search to find the product vectors stored in the database that are closest in distance (using metrics like cosine similarity or Euclidean distance) to the search vector, instantly returning the semantically relevant “cushioned jogging sneakers.”
Architecture of a Vector Database
Vector databases (like Pinecone, Milvus, Weaviate, or Qdrant) optimize for calculating distances between vectors across millions or billions of records with millisecond latency.
Calculating the exact distance between a query vector and 100 million stored vectors requires a massive brute-force scan. To achieve sub-second query performance, vector databases use specialized indexing algorithms for Approximate Nearest Neighbor (ANN) search:
HNSW (Hierarchical Navigable Small World): A graph-based index that creates multiple layers of links between vectors. Searches start at the top, sparse layer and navigate down to denser layers, finding the nearest neighbors logarithmically fast.
IVF (Inverted File Index): Divides the vector space into clusters. When a query vector arrives, the system only calculates distances against the vectors residing in the cluster nearest to the query vector, skipping the rest of the database.

Vector Databases in the RAG Architecture
Vector databases are the foundational data infrastructure for RAG (Retrieval-Augmented Generation) applications. Large Language Models (LLMs) like ChatGPT have a fixed knowledge cutoff and are prone to hallucinations. RAG solves this by grounding the LLM in enterprise data.
In a RAG pipeline, an organization’s documents (PDFs, Confluence pages, support tickets) are chunked into paragraphs, converted to vector embeddings, and stored in a vector database. When a user asks a chatbot a question, the question is vectorized, the vector database retrieves the most semantically relevant document chunks, and those chunks are injected into the LLM’s prompt. The LLM then generates an accurate answer based solely on the retrieved enterprise context.
While dedicated vector databases were the early pioneers, traditional databases are rapidly adding vector capabilities. PostgreSQL (via the pgvector extension) and data warehouses now support vector storage and cosine similarity search, allowing organizations to perform semantic search alongside traditional relational SQL queries within their existing data architecture.
Learn More
To dive deeper into these architectures and master the modern data ecosystem, check out the comprehensive books by Alex Merced available in our Books section.