RAG Pipeline Best Practices with UnSearch
Optimize your retrieval-augmented generation pipeline with real-time web search, smart content extraction, and relevance filtering.
Why Real-Time Search for RAG?
Static knowledge bases go stale. By combining your vector store with real-time web search, your RAG pipeline always has access to the latest information. UnSearch makes this easy with a single API call.
Architecture Overview
A typical RAG pipeline with UnSearch:
1. User submits a query 2. Query goes to both your vector store and UnSearch Search API 3. UnSearch returns relevant web results with scraped content 4. Results are chunked and ranked alongside vector store results 5. Top chunks are passed to the LLM as context
Best Practice 1: Use Scrape Mode
Enable scrape_content in your search requests to get full page content, not just snippets. This gives your LLM much richer context to work with.
Best Practice 2: Multi-Engine Search
Configure multiple search engines for broader coverage. Academic queries benefit from Google Scholar and arXiv, while news queries work best with dedicated news engines.
Best Practice 3: Relevance Filtering
Use the relevance scores returned by UnSearch to filter out low-quality results before they reach your LLM. This reduces noise and improves answer quality.
Best Practice 4: Caching
For frequently asked questions, cache UnSearch results to reduce latency and API usage. Set a TTL based on how time-sensitive your domain is — news might need 15-minute TTLs, while technical documentation can be cached for hours.
Measuring Quality
Track these metrics to evaluate your RAG pipeline: answer accuracy, source relevance, latency (p50/p95), and cost per query. UnSearch's usage dashboard helps you monitor the search component.