A lightweight proxy that sits between your app and any LLM provider. Every request scored, every hallucination flagged, every cost tracked — in real time, with zero changes to your model calls.
Features
Drop-in proxy, no SDK changes. Plug in once, observe everything.
OpenAI-compatible endpoint. Route any provider — OpenAI, Anthropic, Groq, Gemini, Grok, Mistral, Together, NVIDIA, Cohere — through a single base URL.
Automatically compares every response against your source documents. Flags contradictions, unsupported claims, and grounding failures before they reach users.
ML scorer evaluates every response for hallucination risk and grounding quality. Cross-model verification adds a second opinion from a configurable secondary LLM.
Track multi-turn agent sessions end-to-end. Group traces by session ID, visualise decision chains, and replay any request on the dashboard.
Token usage and USD cost broken down per feature, per model, per day. Set per-feature alert thresholds and receive webhook notifications when budgets are exceeded.
Detect and redact personally identifiable information in prompts before they leave your infrastructure. Configurable per feature, zero latency overhead.
How it works
Change OPENAI_BASE_URL to localhost:8080. The gateway forwards every request to the real provider — your existing code is untouched.
Every response is asynchronously evaluated: hallucination risk, RAG grounding, PII detection, cross-model agreement. No added latency on the hot path.
Open localhost:3000 to see traces, warnings, costs, and session replays. Configure thresholds and webhook alerts per feature.
Integration
Set the base URL and send your first request. Works with any OpenAI-compatible client — Python, Node, Go, curl. Pass X-Feature-Name to enable per-feature controls.
from openai import OpenAI # only change: point base_url at ajah client = OpenAI( api_key="your-openai-key", base_url="http://localhost:8080/openai/v1", ) response = client.chat.completions.create( model="gpt-4o", messages=[{ "role": "user", "content": "What is the capital of France?", }], extra_headers={ "X-Feature-Name": "fact-check", "X-Session-ID": "user-session-42", }, ) # response is identical — ajah scores async print(response.choices[0].message.content)
Works with every major provider
Open source, self-hosted, MIT licensed. No data leaves your network.