Retrieval-Augmented Generation is the recommended pattern for grounding LLM output in your own data. Every blog post about it leads with a vector database. Most of our production RAG systems don’t use one โ and they work fine. Here’s how we decide.
What RAG actually needs
RAG needs two things: a way to find the right document for a given query, and a way to pass that document into the LLM’s context window. Vector embeddings are one way to do the first thing. They’re not the only way.
When a vector DB earns its keep
Vector search wins when you’ve got a large corpus, unstructured natural-language queries, and no usable keyword signal. A customer support knowledge base with 10,000 articles and free-form user questions is the textbook case. Pinecone, Qdrant, pgvector, all valid choices.
When it doesn’t
Most client RAG systems we ship have a few hundred documents, a small number of well-defined query types, and structured metadata we can filter on. Full-text search on Postgres with sensible indexes hits the right document in milliseconds, every time, no embeddings required. The query is cheaper, the system is simpler, and the team can debug it without learning a new query language.
The hybrid pattern
For the middle case, we use both. Postgres full-text search to narrow the candidate set to ten documents, then a small embedding-based rerank if we need a more nuanced match. It’s a 20-line addition, not a new piece of infrastructure.
The test we run before adding a vector DB
Build the simplest possible retrieval โ full-text search, regex, even hand-written rules โ and measure how often it returns the right document. If it’s above 90%, ship it. If it’s below 70%, you probably need embeddings. The 70โ90% gap is where the engineering judgement lives.