Documentation

trace.ai SDK

Two lines of code to start tracing every LLM call in your application — tokens, latency, cost, anomaly scores, and AI-powered analysis.

Quick start

import { Tracer } from '@trace-ai/sdk'
import Anthropic from '@anthropic-ai/sdk'

const tracer = new Tracer({ apiKey: 'trace_...' })
const anthropic = tracer.wrapAnthropic(new Anthropic())

// Use exactly like the normal Anthropic client
const response = await anthropic.messages.create({
  model: 'claude-haiku-4-5-20251001',
  max_tokens: 256,
  messages: [{ role: 'user', content: 'Hello!' }],
})

// Every call is now automatically traced in your dashboard

ℹ

Find your API key in the trace.ai dashboard under Settings → API Key for each project.

Core concepts

trace.ai organises your LLM activity into three levels:

ProjectAn isolated workspace with its own API key, dashboard, and alert configuration. One API key = one project.

RunA single end-to-end execution of your AI workflow — e.g. one user request handled by a multi-step pipeline. Each run has a unique run_id that groups its steps together.

StepA single LLM call within a run. Steps are ordered by step_index and named with _trace: { stepName }. Each step captures model, tokens, latency, cost, and output.

Installation

bash

npm install @trace-ai/sdk

The SDK is a thin wrapper — no background processes, no native dependencies. It works in Node.js 18+ and any runtime with the Fetch API.

new Tracer(config)

The entry point. Create one instance per application (or per isolated environment).

const tracer = new Tracer({
  apiKey: 'trace_...',   // required — your project API key
  apiUrl: '...',         // optional — override for self-hosting / local dev
  runId:  '...',         // optional — provide your own run ID
})

Option	Type	Description
apiKey	string	Your project API key. Required.
apiUrl	string?	Custom ingest URL. Defaults to trace-ai servers.
runId	string?	Override the auto-generated run ID for this tracer.

Returns a drop-in replacement for the Anthropic client. It intercepts every messages.create() call, forwards it to the real SDK unchanged, and automatically ingests the trace after the response returns.

import Anthropic from '@anthropic-ai/sdk'

const anthropic = tracer.wrapAnthropic(new Anthropic())

// Use it exactly like the original client — all params still work
const res = await anthropic.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 512,
  messages: [{ role: 'user', content: 'Summarise this document...' }],
})

✦

The original Anthropic client is not modified. You can keep a reference to both — the wrapped client for traced calls and the original for anything you don't want traced.

run()

This is the key concept for multi-step pipelines. Calling anthropic.run() creates a TracedRun — a fresh execution context with its own unique run_id. Every step you call on that run is grouped together in the dashboard under the same run.

⚠

Without run(), all calls share the tracer's single runId and appear as one long run. For multi-step workflows, always call run() at the start of each user request.

async function handleRequest(userMessage: string) {
  // Create a new run for this request — fresh run_id, step_index resets to 0
  const run = anthropic.run()

  // Step 1 — run_id: "a3f9...", step_index: 0
  const c1 = await run.messages.create({
    model: 'claude-haiku-4-5-20251001',
    max_tokens: 16,
    messages: [{ role: 'user', content: `Classify: "${userMessage}"` }],
    _trace: { stepName: 'classify-intent' },
  } as TracedMessageParams)

  // Step 2 — same run_id: "a3f9...", step_index: 1
  const c2 = await run.messages.create({
    model: 'claude-haiku-4-5-20251001',
    max_tokens: 512,
    messages: [{ role: 'user', content: userMessage }],
    _trace: { stepName: 'generate-reply' },
  } as TracedMessageParams)

  // run.runId — the shared ID for both steps above
  console.log('run:', run.runId)
}

Each call to anthropic.run() creates a completely independent run. Parallel requests each get their own run_id — they never interfere.

Streaming

messages.stream() is fully supported on both the wrapped client and TracedRun. Tokens and latency are captured after the stream ends — zero impact on streaming latency.

const stream = run.messages.stream({
  model: 'claude-haiku-4-5-20251001',
  max_tokens: 512,
  messages: [{ role: 'user', content: 'Tell me a story.' }],
  _trace: { stepName: 'story' },
})

for await (const event of stream) {
  if (event.type === 'content_block_delta' && event.delta.type === 'text_delta') {
    process.stdout.write(event.delta.text)
  }
}
// trace is ingested automatically once the stream completes

ℹ

The stream is returned immediately and passed through unchanged. Ingestion happens via finalMessage() as a fire-and-forget side effect — your streaming latency is unaffected.

Naming steps

Add _trace: { stepName: '...' } to any messages.create() call to give the step a human-readable name. Without it, steps are auto-named step_1, step_2, etc.

// Named steps appear in the dashboard and AI analysis reports
await run.messages.create({
  model: 'claude-haiku-4-5-20251001',
  max_tokens: 64,
  messages: [...],
  _trace: { stepName: 'extract-entities' },  // ← name it
} as TracedMessageParams)

✦

Use descriptive, consistent step names — the anomaly engine and AI analysis both reference them by name. Good names make root cause reports much more actionable.

Manual ingest

For steps outside of the Anthropic client (external API calls, custom model endpoints, pre-computed results), use tracer.ingest() directly.

await tracer.ingest({
  run_id:        'my-run-id',      // group with other steps
  step_name:     'fetch-context',
  step_index:    1,
  model:         'custom-model',
  prompt:        'What is the user asking?',
  input_tokens:  120,
  output_tokens: 48,
  total_tokens:  168,
  latency_ms:    340,
  cost:          0.0014,
  status_success: true,
  output_code:   'The user wants a refund.',
})

Field	Description
run_id	Groups steps into a single run. Use run.runId from a TracedRun, or any UUID.
step_name	Human-readable name for this step. Shown in the dashboard and analysis.
step_index	Order within the run. Steps are sorted by this in the run graph.
model	Model identifier string, e.g. "claude-haiku-4-5-20251001".
prompt	The prompt sent to the model. For chat, use JSON.stringify({ system, messages }).
input_tokens	Input token count as reported by the model.
output_tokens	Output token count as reported by the model.
total_tokens	Should equal input + output. Mismatch triggers anomaly code 1007.
latency_ms	Wall-clock time from request start to response received.
cost	USD cost for this call. Use tracer cost helpers or compute manually.
status_success	true if the call completed normally, false if it errored.
output_code	The model's response text. Used by the anomaly engine for shape analysis.
error	Error message string. Required when status_success is false.

Anomaly detection

Every ingested call is automatically scored by a 4-layer engine in the background. No configuration required — it runs on every call with zero overhead to your application.

L1Hard failures

status_success=false, error present, token accounting mismatch (total ≠ input+output), zero output with non-empty error.

L2Format violations

Prompt asked for JSON but output isn't valid JSON. Prompt asked for yes/no but output is prose. Enum step returned a non-enumerated value.

L3Shape fingerprinting

Output shape doesn't match what the prompt asked for. Unbalanced brackets. Named JSON keys missing from the output. Word count violations.

L4Numeric anomalies

Latency spikes, cost outliers, token ratio drift, stall patterns. Thresholds adapt to your project's baseline using p95 of recent calls — a project with consistently fast calls gets a tighter limit than one with variable latency.

Scores accumulate across layers. A single L1 hit (100 pts) is immediately critical. L4 conditions score 10–25 pts each and require several to fire before crossing the threshold. L4 limits are dynamic — once a project has 30+ calls, trace.ai computes the p95 of recent latency, token usage, and cost and uses that as the threshold instead of static defaults. You can also override them manually in Settings → L4 anomaly thresholds.

AI analysis

Open any run in the dashboard and click ✦ Analyze Run. trace.ai sends the full run context — every step, every anomaly score, every condition code — to claude-sonnet-4-6 and returns a structured report:

✦Example output

Summary

The pipeline failed at generate-response, but the run completed 2 of 3 steps before crashing. Total anomaly score: 295pts across 3 steps.

Root cause

parse-request returned malformed JSON (unclosed bracket). This propagated into enrich-context causing a stall, then crashed generate-response with a null-reference error when it attempted to read the entity list.

Recommendations

— Add JSON.parse validation after parse-request before passing output downstream
— Add a retry with exponential backoff on enrich-context when input is null
— Set a latency budget on enrich-context (currently 6.4s with 3 output tokens)

Analysis cost is tracked per project in the USAGE table and will appear in your billing dashboard.

Integrations

trace.ai can push anomaly alerts to your existing tooling. Both integrations are configured per-project in Settings — no code changes needed.

Slack

Paste a Slack Incoming Webhook URL into your project settings. trace.ai will post to that channel when:

Step errorAny call where status_success is false fires an immediate alert with the step name, model, error message, and run ID.

Error rate spikeIf more than N% of the last M calls fail, a rate alert fires. Both thresholds are configurable (default: 25% over 20 calls).

Budget exceededWhen monthly AI analysis spend crosses your configured budget (set in Settings → Monthly budget), a one-time alert fires per hour.

// No code needed — configure in Settings → Integrations
// Test your webhook with the "Send test ping" button

✦

You can toggle Alert on error and set your own rate threshold in the Settings tab. Use the test button to confirm delivery before going live.

Sentry

Add your Sentry project DSN in Settings and trace.ai sends two types of data to Sentry — completely isolated from your own backend's Sentry client:

Performance transactionsEvery LLM call becomes a Sentry transaction named after its step. Latency, tokens, cost, and anomaly score appear as measurements. All steps in the same run share a trace_id, so Sentry's distributed trace view reconstructs your full pipeline as a waterfall.

Anomaly eventsWhen a call crosses the anomaly threshold, a separate error event fires into your Sentry issues feed. Repeated failures on the same step fingerprint into one issue rather than spamming.

// No code needed — paste your DSN in Settings → Integrations
// DSN format: https://<key>@<org>.ingest.sentry.io/<project>

Where to find your data in Sentry:

Explore → TracesAll LLM calls as transactions. Click any row to see the span waterfall — root span op:ai.inference, child span op:ai.model.invoke with gen_ai.usage.* attributes.

IssuesAnomaly events grouped by step name. Each issue shows the full condition breakdown, anomaly score, and a link to the run.

Choose an alert level to control which anomalies reach Sentry Issues (performance transactions always fire when a DSN is set):

Critical onlyAnomaly events fire when total score ≥ 100 pts (any L1 condition, or accumulated L2–L4). Sent as error-level.

Warning + criticalFires for any anomaly hit, even sub-threshold. Warnings are sent as warning-level, criticals as errors.

OffDisables all Sentry output — both performance transactions and anomaly events. DSN is saved but nothing is sent.

ℹ

Performance spans follow OpenTelemetry GenAI semantic conventions — gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.system: "anthropic" — so they are compatible with Sentry's native AI monitoring features.

Ready to instrument your first pipeline?

Get started free →