Skip to content

How Argmin works

You don't need this page to onboard — but it explains what happens to your data after you connect, and why the integration is shaped the way it is.

The attribution graph

Argmin is the enterprise system of record for AI consumption. It takes every AI inference signal it can see and links it across six layers into a single graph:

graph TD
  R[AI inference request] --> S[Service]
  S --> C[Code ownership]
  C --> I[Identity]
  I --> O[Org hierarchy]
  O --> B[Budget / cost center]

The result answers questions finance and engineering both ask: which team, which service, which person, which budget is responsible for a given slice of AI spend — with a confidence score on every attribution (capped at 0.95; Argmin never asserts certainty).

Where the data comes from

Source Gives Argmin
Cloud connector (onboarding) Billed cost (CUR / BigQuery billing export / Cost Management), provider-native usage logs (Bedrock, Vertex AI, Azure OpenAI), and identity inventory.
Ingestion API / proxy (optional) Request-level events: per-call tokens, cost, latency, and the identity/service that made the call.

The connector alone produces account/team-level attribution. Adding events gives you per-request resolution.

Deployment model

Argmin runs as a dedicated data plane inside your cloud trust boundary (see CUSTOMER_DEPLOYMENT.md). This preserves three product invariants:

  1. Your operational data stays in your trust boundary.
  2. Integrations stay read-only. Argmin reads cost/usage/identity; it never writes back.
  3. The decision-time interceptor fails open — it never depends on an external control plane to let your traffic through.

Argmin's own dev / staging / production environments exist for development, certification, and release rehearsal — not for holding your data.

Fail-open, always

The single hardest rule in the platform: no code path may block customer production traffic. The gateway interceptor runs on a 50 ms end-to-end latency budget enforced with a hardware-level timeout, not application logic. If anything is slow or down, your request proceeds and attribution is reconciled later.

No content capture

The core pipeline never stores prompt or completion text. Attribution is built from metadata — model, tokens, cost, identity, timing — which is why the ingestion schema has no content field at all.

The trust & security model in full