Skip to content

Drop-in proxy

The proxy records each AI call and forwards it to the real provider — so you get request-level attribution without touching application logic. You change one thing: the provider SDK's base URL.

Fail-open guarantee

The proxy runs on a hard latency budget. If Argmin is slow or unreachable, the request is forwarded to the provider anyway. The proxy can never block or fail your production traffic.

How it works

graph LR
  App[Your app] -->|base_url = Argmin proxy| Px[Argmin proxy]
  Px -->|forwards| Prov[OpenAI / Anthropic]
  Px -.->|emits InvocationEvent| Pipe[Attribution pipeline]
  Prov -->|response| Px --> App

The proxy passes your request through unchanged (including your provider API key, which Argmin does not store), captures metadata — model, tokens, latency, cost — and emits an InvocationEvent. Content is never captured.

Endpoints

Relative to your Argmin instance host (provided during onboarding):

Provider Proxy base path
OpenAI /api/v1/proxy/openai/v1
Anthropic /api/v1/proxy/anthropic/v1

OpenAI example

from openai import OpenAI

client = OpenAI(
    base_url="https://<your-argmin-instance>/api/v1/proxy/openai/v1",
    api_key="<your OpenAI key>",   # forwarded as-is; not stored by Argmin
)

client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "…"}],
)
curl https://<your-argmin-instance>/api/v1/proxy/openai/v1/chat/completions \
  -H "Authorization: Bearer <your OpenAI key>" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"…"}]}'

Anthropic example

from anthropic import Anthropic

client = Anthropic(
    base_url="https://<your-argmin-instance>/api/v1/proxy/anthropic/v1",
    api_key="<your Anthropic key>",
)

client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "…"}],
)
curl https://<your-argmin-instance>/api/v1/proxy/anthropic/v1/messages \
  -H "x-api-key: <your Anthropic key>" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet-4-6","max_tokens":1024,"messages":[{"role":"user","content":"…"}]}'

Attributing the call

To attribute beyond "this API key", forward identity/service hints as headers your platform team configures (e.g. an identity header or your existing trace headers). Ask your onboarding contact for the header convention enabled on your instance.

Streaming

Streaming responses (Server-Sent Events) are supported and forwarded incrementally; token counts are derived from the stream. Bedrock's binary event stream is partially supported — confirm coverage with your onboarding contact if you rely on it.

Confirm events are landing