Building production RAG without a vector DB obsession

Retrieval-Augmented Generation is the recommended pattern for grounding LLM output in your own data. Every blog post about it leads with a vector database. Most of our production RAG systems don’t use one — and they work fine. Here’s how we decide.

What RAG actually needs

RAG needs two things: a way to find the right document for a given query, and a way to pass that document into the LLM’s context window. Vector embeddings are one way to do the first thing. They’re not the only way.

When a vector DB earns its keep

Vector search wins when you’ve got a large corpus, unstructured natural-language queries, and no usable keyword signal. A customer support knowledge base with 10,000 articles and free-form user questions is the textbook case. Pinecone, Qdrant, pgvector, all valid choices.

When it doesn’t

Most client RAG systems we ship have a few hundred documents, a small number of well-defined query types, and structured metadata we can filter on. Full-text search on Postgres with sensible indexes hits the right document in milliseconds, every time, no embeddings required. The query is cheaper, the system is simpler, and the team can debug it without learning a new query language.

The hybrid pattern

For the middle case, we use both. Postgres full-text search to narrow the candidate set to ten documents, then a small embedding-based rerank if we need a more nuanced match. It’s a 20-line addition, not a new piece of infrastructure.

The test we run before adding a vector DB

Build the simplest possible retrieval — full-text search, regex, even hand-written rules — and measure how often it returns the right document. If it’s above 90%, ship it. If it’s below 70%, you probably need embeddings. The 70–90% gap is where the engineering judgement lives.

AI & Automation

Mobile App Development

E-commerce Development

Custom Software Development

ERP & Odoo

Cloud & DevOps

Golang Development

Building production RAG without a vector DB obsession

What RAG actually needs

When a vector DB earns its keep

When it doesn’t

The hybrid pattern

The test we run before adding a vector DB

More from the journal.

Conversational AI that doesn’t fall over: lessons from CallVista

Migrating 7M+ SKUs to Shopify Plus: what we’d do differently

Why we still pick Django for production AI products

Have a similar problem
worth solving?

What RAG actually needs

When a vector DB earns its keep

When it doesn’t

The hybrid pattern

The test we run before adding a vector DB

More from the journal.

Conversational AI that doesn’t fall over: lessons from CallVista

Migrating 7M+ SKUs to Shopify Plus: what we’d do differently

Why we still pick Django for production AI products

Have a similar problemworth solving?

Have a similar problem
worth solving?