trace.ai SDK
Two lines of code to start tracing every LLM call in your application — tokens, latency, cost, anomaly scores, and AI-powered analysis.
Quick start
import { Tracer } from '@trace-ai/sdk'
import Anthropic from '@anthropic-ai/sdk'
const tracer = new Tracer({ apiKey: 'trace_...' })
const anthropic = tracer.wrapAnthropic(new Anthropic())
// Use exactly like the normal Anthropic client
const response = await anthropic.messages.create({
model: 'claude-haiku-4-5-20251001',
max_tokens: 256,
messages: [{ role: 'user', content: 'Hello!' }],
})
// Every call is now automatically traced in your dashboardCore concepts
trace.ai organises your LLM activity into three levels:
Installation
npm install @trace-ai/sdk
The SDK is a thin wrapper — no background processes, no native dependencies. It works in Node.js 18+ and any runtime with the Fetch API.
new Tracer(config)
The entry point. Create one instance per application (or per isolated environment).
const tracer = new Tracer({
apiKey: 'trace_...', // required — your project API key
apiUrl: '...', // optional — override for self-hosting / local dev
runId: '...', // optional — provide your own run ID
})| Option | Type | Description |
|---|---|---|
| apiKey | string | Your project API key. Required. |
| apiUrl | string? | Custom ingest URL. Defaults to trace-ai servers. |
| runId | string? | Override the auto-generated run ID for this tracer. |
wrapAnthropic(client)
Returns a drop-in replacement for the Anthropic client. It intercepts every messages.create() call, forwards it to the real SDK unchanged, and automatically ingests the trace after the response returns.
import Anthropic from '@anthropic-ai/sdk'
const anthropic = tracer.wrapAnthropic(new Anthropic())
// Use it exactly like the original client — all params still work
const res = await anthropic.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 512,
messages: [{ role: 'user', content: 'Summarise this document...' }],
})run()
This is the key concept for multi-step pipelines. Calling anthropic.run() creates a TracedRun — a fresh execution context with its own unique run_id. Every step you call on that run is grouped together in the dashboard under the same run.
run(), all calls share the tracer's single runId and appear as one long run. For multi-step workflows, always call run() at the start of each user request.async function handleRequest(userMessage: string) {
// Create a new run for this request — fresh run_id, step_index resets to 0
const run = anthropic.run()
// Step 1 — run_id: "a3f9...", step_index: 0
const c1 = await run.messages.create({
model: 'claude-haiku-4-5-20251001',
max_tokens: 16,
messages: [{ role: 'user', content: `Classify: "${userMessage}"` }],
_trace: { stepName: 'classify-intent' },
} as TracedMessageParams)
// Step 2 — same run_id: "a3f9...", step_index: 1
const c2 = await run.messages.create({
model: 'claude-haiku-4-5-20251001',
max_tokens: 512,
messages: [{ role: 'user', content: userMessage }],
_trace: { stepName: 'generate-reply' },
} as TracedMessageParams)
// run.runId — the shared ID for both steps above
console.log('run:', run.runId)
}Each call to anthropic.run() creates a completely independent run. Parallel requests each get their own run_id — they never interfere.
Streaming
messages.stream() is fully supported on both the wrapped client and TracedRun. Tokens and latency are captured after the stream ends — zero impact on streaming latency.
const stream = run.messages.stream({
model: 'claude-haiku-4-5-20251001',
max_tokens: 512,
messages: [{ role: 'user', content: 'Tell me a story.' }],
_trace: { stepName: 'story' },
})
for await (const event of stream) {
if (event.type === 'content_block_delta' && event.delta.type === 'text_delta') {
process.stdout.write(event.delta.text)
}
}
// trace is ingested automatically once the stream completesfinalMessage() as a fire-and-forget side effect — your streaming latency is unaffected.Naming steps
Add _trace: { stepName: '...' } to any messages.create() call to give the step a human-readable name. Without it, steps are auto-named step_1, step_2, etc.
// Named steps appear in the dashboard and AI analysis reports
await run.messages.create({
model: 'claude-haiku-4-5-20251001',
max_tokens: 64,
messages: [...],
_trace: { stepName: 'extract-entities' }, // ← name it
} as TracedMessageParams)Manual ingest
For steps outside of the Anthropic client (external API calls, custom model endpoints, pre-computed results), use tracer.ingest() directly.
await tracer.ingest({
run_id: 'my-run-id', // group with other steps
step_name: 'fetch-context',
step_index: 1,
model: 'custom-model',
prompt: 'What is the user asking?',
input_tokens: 120,
output_tokens: 48,
total_tokens: 168,
latency_ms: 340,
cost: 0.0014,
status_success: true,
output_code: 'The user wants a refund.',
})| Field | Description |
|---|---|
| run_id | Groups steps into a single run. Use run.runId from a TracedRun, or any UUID. |
| step_name | Human-readable name for this step. Shown in the dashboard and analysis. |
| step_index | Order within the run. Steps are sorted by this in the run graph. |
| model | Model identifier string, e.g. "claude-haiku-4-5-20251001". |
| prompt | The prompt sent to the model. For chat, use JSON.stringify({ system, messages }). |
| input_tokens | Input token count as reported by the model. |
| output_tokens | Output token count as reported by the model. |
| total_tokens | Should equal input + output. Mismatch triggers anomaly code 1007. |
| latency_ms | Wall-clock time from request start to response received. |
| cost | USD cost for this call. Use tracer cost helpers or compute manually. |
| status_success | true if the call completed normally, false if it errored. |
| output_code | The model's response text. Used by the anomaly engine for shape analysis. |
| error | Error message string. Required when status_success is false. |
Anomaly detection
Every ingested call is automatically scored by a 4-layer engine in the background. No configuration required — it runs on every call with zero overhead to your application.
status_success=false, error present, token accounting mismatch (total ≠ input+output), zero output with non-empty error.
Prompt asked for JSON but output isn't valid JSON. Prompt asked for yes/no but output is prose. Enum step returned a non-enumerated value.
Output shape doesn't match what the prompt asked for. Unbalanced brackets. Named JSON keys missing from the output. Word count violations.
Latency spikes, cost outliers, token ratio drift, stall patterns. Thresholds adapt to your project's baseline using p95 of recent calls — a project with consistently fast calls gets a tighter limit than one with variable latency.
Scores accumulate across layers. A single L1 hit (100 pts) is immediately critical. L4 conditions score 10–25 pts each and require several to fire before crossing the threshold. L4 limits are dynamic — once a project has 30+ calls, trace.ai computes the p95 of recent latency, token usage, and cost and uses that as the threshold instead of static defaults. You can also override them manually in Settings → L4 anomaly thresholds.
AI analysis
Open any run in the dashboard and click ✦ Analyze Run. trace.ai sends the full run context — every step, every anomaly score, every condition code — to claude-sonnet-4-6 and returns a structured report:
The pipeline failed at generate-response, but the run completed 2 of 3 steps before crashing. Total anomaly score: 295pts across 3 steps.
parse-request returned malformed JSON (unclosed bracket). This propagated into enrich-context causing a stall, then crashed generate-response with a null-reference error when it attempted to read the entity list.
- — Add JSON.parse validation after parse-request before passing output downstream
- — Add a retry with exponential backoff on enrich-context when input is null
- — Set a latency budget on enrich-context (currently 6.4s with 3 output tokens)
Analysis cost is tracked per project in the USAGE table and will appear in your billing dashboard.
Integrations
trace.ai can push anomaly alerts to your existing tooling. Both integrations are configured per-project in Settings — no code changes needed.
Slack
Paste a Slack Incoming Webhook URL into your project settings. trace.ai will post to that channel when:
// No code needed — configure in Settings → Integrations // Test your webhook with the "Send test ping" button
Sentry
Add your Sentry project DSN in Settings and trace.ai sends two types of data to Sentry — completely isolated from your own backend's Sentry client:
// No code needed — paste your DSN in Settings → Integrations // DSN format: https://<key>@<org>.ingest.sentry.io/<project>
Where to find your data in Sentry:
Choose an alert level to control which anomalies reach Sentry Issues (performance transactions always fire when a DSN is set):
gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.system: "anthropic" — so they are compatible with Sentry's native AI monitoring features.Ready to instrument your first pipeline?
Get started free →