Helpers — tool, llm, trace, retrieval
Thin wrappers around withSpan. Declare the span once at the function definition site instead of inlining withSpan(...) at every call site.
- Node.js
- Python
args as span input, return value as span output, and records thrown exceptions as status='error'. Don’t use llm for frameworks we already auto-instrument (OpenAI, Anthropic, etc.) — you’ll get duplicate spans.
extractUsage — custom token extraction
llm auto-extracts tokens from three response shapes: OpenAI (usage.prompt_tokens), Anthropic (usage.input_tokens), Google (usageMetadata.promptTokenCount). For anything else, pass your own:
trackLlmCall — imperative LLM span
Use when you’ve already made the LLM call and just want to record it, e.g. rawfetch to Ollama / vLLM / a self-hosted endpoint.
trackLlmCall is a no-op outside wrapAgent. Pass cost explicitly to override the internal pricing table (negotiated rates, self-hosted zero-cost, etc.).
Feedback
Record a user’s reaction to a run — thumbs / rating / comment. One ofsatisfaction, rating, comment must be set.
| Field | Type | Purpose |
|---|---|---|
satisfaction | 'positive' | 'negative' | Thumbs-up / thumbs-down column + cluster satisfaction rollup |
rating | number | Any numeric scale (1–5, NPS, eval score) |
comment | string | Free-text note, searchable in the dashboard |
metadata | object | Free-form JSON, e.g. { source: 'thumbs_widget' } |
Custom attributes
Attach arbitrary key/value pairs to any span. Visible in the span detail drawer, searchable in the dashboard.run.setMetadata({ ... }).
Errors
Two levels — they’re not mutually exclusive.| Level | Trigger | Sets |
|---|---|---|
| Span error | withSpan / tool / llm callback throws | status='error', error_type, error_message on the span; increments run’s error_count |
| Run error | wrapAgent callback throws (or run.setErrorSummary(...)) | status='error', error_summary on the run |
withSpan that bubbles all the way up sets both. wrapAgent always flushes on throw — you don’t need try/catch around it just to “rescue” metrics.
For providers that return HTTP 200 with an error body, either throw after checking (makes the span an error), or setAttribute('llm_error', ...) and keep status='ok' (shows up in filters without inflating error_count).
Tokens and cost
Tokens come from the provider’s response; cost is computed at ingest time from the internal pricing table.| Framework | Token source | Auto-extracted? |
|---|---|---|
| OpenAI / Azure OpenAI | response.usage.{prompt_tokens, completion_tokens} | yes |
| Anthropic | response.usage.{input_tokens, output_tokens} | yes |
| Google Gemini / Vertex | response.usageMetadata.{promptTokenCount, candidatesTokenCount} | yes |
| Bedrock Converse | response.usage.{inputTokens, outputTokens} | yes |
| Cohere | response.meta.tokens.{input_tokens, output_tokens} | yes |
| Mistral | response.usage.{prompt_tokens, completion_tokens} | yes |
| Vercel AI SDK | Accumulated across calls (requires experimental_telemetry) | yes |
| LangChain / LlamaIndex | From the underlying LLM wrapper | yes if wrapper supports it |
| Raw HTTP | — | use trackLlmCall |
(input_tokens × input_rate) + (output_tokens × output_rate) from the pricing table. Pass cost explicitly on setLlm or trackLlmCall to override — enterprise rates, cached tokens, self-hosted zero-cost.
Rollups
| Run field | Source |
|---|---|
total_tokens_in | SUM(span.input_tokens) across kind='llm' spans |
total_tokens_out | SUM(span.output_tokens) across kind='llm' spans |
total_cost | SUM(span.cost) across kind='llm' spans |
span_count | COUNT(span) |
tool_count | COUNT(span WHERE kind='tool') |
error_count | COUNT(span WHERE status='error') |
parentRunId does not cascade tokens from child runs. Use joinRun for unified rollups.