Self-hosted RAG infrastructure with MCP Server support
π FastAPI β’ Redis / Qdrant / PostgreSQL β’ Async β’ Embedding-agnostic β’ MCP Ready
Aquiles-RAG is a production-ready RAG (Retrieval-Augmented Generation) API server that brings high-performance vector search to your applications. Choose your backend (Redis, Qdrant, or PostgreSQL), connect your embedding model, and start building intelligent search systems in minutes.
| Challenge | Aquiles-RAG Solution |
|---|---|
| πΈ Expensive vector databases | Use Redis, Qdrant, or PostgreSQL you already have |
| π Data leaves your infrastructure | Everything runs on your servers |
| π§ Complex RAG setup | Interactive wizard configures everything |
| π Slow integrations | Async clients, batch operations, optimized pipelines |
| π« Vendor lock-in | Switch backends without changing code |
- π Backend Flexibility - Redis HNSW, Qdrant, or PostgreSQL pgvector
- β‘ High Performance - Async operations, batch processing, optimized search
- π€ MCP Server Built-in - Native Model Context Protocol support for AI assistants
- π οΈ Interactive Setup - CLI wizard configures your entire stack
- π Sync & Async Clients - Python and TypeScript/JavaScript SDKs included
- π Optional Re-ranking - Improve results with semantic re-scoring
pip install aquiles-ragConfigure your vector database in seconds:
aquiles-rag configsThe wizard guides you through:
- Backend selection (Redis, Qdrant, or PostgreSQL)
- Connection settings (host, port, credentials)
- TLS/gRPC options
- Optional re-ranker configuration
aquiles-rag serve --host "0.0.0.0" --port 5500from aquiles.client import AquilesRAG
client = AquilesRAG(host="http://127.0.0.1:5500", api_key="YOUR_API_KEY")
# Create index
client.create_index("documents", embeddings_dim=768, dtype="FLOAT32")
# Store document with your embedding function
def get_embedding(text):
return your_embedding_model.encode(text)
client.send_rag(
embedding_func=get_embedding,
index="documents",
name_chunk="intro",
raw_text="Your document text here..."
)
# Query
results = client.query("documents", query_embedding, top_k=5)
print(results)That's it! You now have a working RAG system.
| Backend | Features | Best For |
|---|---|---|
| Redis | HNSW indexing, fast in-memory search | Speed-critical applications |
| Qdrant | HTTP/gRPC, collections, filters | Scalable production systems |
| PostgreSQL | pgvector extension, SQL integration | Existing Postgres infrastructure |
All backends support:
- Vector similarity search (cosine, inner product)
- Metadata filtering
- Batch operations
- Optional re-ranking
Aquiles-RAG includes a built-in Model Context Protocol server for seamless AI assistant integration.
aquiles-rag mcp-serve --host "0.0.0.0" --port 5500 --transport "sse"from agents import Agent, Runner
from agents.mcp import MCPServerSse
# Connect to MCP server
mcp_server = MCPServerSse({
"url": "http://localhost:5500/sse",
"headers": {"X-API-Key": "YOUR_API_KEY"}
})
await mcp_server.connect()
# Create agent with RAG tools
agent = Agent(
name="RAG Assistant",
instructions="You can store and query documents using the vector database.",
mcp_servers=[mcp_server],
model="gpt-4"
)
# Agent now has access to:
# - create_index
# - send_info (store documents)
# - query_rag (semantic search)
# - list_indexes
# - delete_index
result = await Runner.run(agent, "Store this document and find similar content")MCP Tools Available:
- Index management (create, list, delete)
- Document ingestion with automatic chunking
- Semantic search with configurable parameters
- Metadata filtering
from aquiles.client import AsyncAquilesRAG
client = AsyncAquilesRAG(host="http://127.0.0.1:5500", api_key="YOUR_API_KEY")
async def main():
# Create index
await client.create_index("docs", embeddings_dim=1536)
# Store documents (parallel chunking)
await client.send_rag(
embedding_func=async_get_embedding,
index="docs",
name_chunk="document_1",
raw_text=long_text,
metadata={
"author": "John Doe",
"source": "documentation"
}
)
# Query
results = await client.query("docs", query_embedding, top_k=5)
print(results)
asyncio.run(main())npm install @aquiles-ai/aquiles-rag-clientimport { AsyncAquilesRAG } from '@aquiles-ai/aquiles-rag-client';
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
async function getEmbedding(text: string): Promise<number[]> {
const resp = await openai.embeddings.create({
model: "text-embedding-3-small",
input: text,
});
return resp.data[0].embedding;
}
const client = new AsyncAquilesRAG({
host: 'http://127.0.0.1:5500',
apiKey: 'your-api-key',
});
// Create index (1536 dimensions for text-embedding-3-small)
await client.createIndex('my_docs', 1536, 'FLOAT32');
// Store document
await client.sendRAG(
getEmbedding,
'my_docs',
'doc_1',
'Your document text...',
{
embeddingModel: 'text-embedding-3-small',
metadata: { author: 'John Doe' }
}
);
// Query
const queryEmb = await getEmbedding('What is this about?');
const results = await client.query('my_docs', queryEmb, { topK: 5 });
console.log(results);Improve search results with semantic re-scoring:
# Enable during setup wizard
aquiles-rag configsRe-ranking refines results after vector search by scoring (query, document) pairs for better relevance.
Access the interactive UI:
http://localhost:5500/ui
Features:
- Test index creation and queries
- Inspect live configurations
- Protected Swagger UI documentation
- Real-time request/response monitoring
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Clients β
β HTTP/HTTPS β’ Python SDK β’ TypeScript SDK β’ MCP Server β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ
β FastAPI Server β
β β’ Request validation β
β β’ Business logic orchestration β
β β’ Optional re-ranking β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ
β Vector Store β
β Redis HNSW β’ Qdrant Collections β’ PostgreSQL pgvector β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Flow:
- Client sends embedding + query parameters
- Server validates and routes to vector store
- Vector store returns top-k candidates
- Optional re-ranker refines results
- Formatted response returned to client
| Who | What |
|---|---|
| π AI Startups | Build RAG features without vendor costs |
| π¨βπ» Developers | Prototype semantic search quickly |
| π’ Enterprises | Private, scalable document search |
| π¬ Researchers | Experiment with embeddings and retrieval |
- Python 3.9+
- One of: Redis, Qdrant, or PostgreSQL with pgvector
- pip or uv
Quick Redis Setup (Docker):
docker run -d --name redis-stack -p 6379:6379 redis/redis-stack-server:latestPostgreSQL Note: Aquiles-RAG doesn't run automatic migrations. Create the pgvector extension and required tables manually before use.
- FastAPI - High-performance async API framework
- Redis / Qdrant / PostgreSQL - Vector storage backends
- NumPy - Efficient array operations
- Pydantic - Request/response validation
- HTTPX - Async HTTP client
- Click - CLI framework
curl -X POST http://localhost:5500/create/index \
-H "X-API-Key: YOUR_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"indexname": "documents",
"embeddings_dim": 768,
"dtype": "FLOAT32"
}'curl -X POST http://localhost:5500/rag/create \
-H "X-API-Key: YOUR_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"index": "documents",
"name_chunk": "doc1_part1",
"raw_text": "Document content...",
"embeddings": [0.12, 0.34, ...]
}'curl -X POST http://localhost:5500/rag/query-rag \
-H "X-API-Key: YOUR_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"index": "documents",
"embeddings": [0.78, 0.90, ...],
"top_k": 5,
"cosine_distance_threshold": 0.6
}'Redis:
- Fast in-memory HNSW indexing
- Full metrics via
/status/ram - Supports HASH storage with COSINE search
Qdrant:
- HTTP or gRPC connections
- Collection-based organization
- Limited metrics compared to Redis
PostgreSQL:
- Requires manual pgvector setup
- No automatic migrations
- SQL-native filtering and joins
- Check Postgres monitoring for metrics
We welcome contributions! See the test suite in test/ for examples:
- Client SDK tests
- API endpoint tests
- Deployment validation
β Star this project β’ π Report issues
Built with β€οΈ for the AI community
