LlamaIndex

What auto-instruments

Install opentelemetry-instrumentation-llamaindex alongside llama-index. Every retriever/query/chat call inside wrap_agent emits spans.

Construct	Span kind	Auto-extracted
`VectorIndexRetriever.retrieve`	`retrieval`	query, doc count, doc ids
`QueryEngine.query`	`agent`	wraps retrieval + synthesis
`ChatEngine.chat`	`agent`	wraps retrieval + LLM + memory
`ResponseSynthesizer`	`llm`	model, tokens
Node post-processors	`generic`	name per post-processor
Embeddings	`llm`	model, tokens

Install

pip install llama-index opentelemetry-instrumentation-llamaindex

Minimum example — RAG query

import trodo, os
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

trodo.init(site_id=os.environ["TRODO_SITE_ID"])

documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

with trodo.wrap_agent("llamaindex-rag") as run:
    run.set_input({"q": "What is HNSW?"})
    response = query_engine.query("What is HNSW?")
    run.set_output(str(response))

Resulting tree:

run (wrap_agent)
 └─ agent   QueryEngine.query
      ├─ retrieval   VectorIndexRetriever.retrieve
      ├─ generic     post-processor: SimilarityPostprocessor
      └─ llm         ResponseSynthesizer (gpt-4o-mini)

Chat engine with memory

chat_engine = index.as_chat_engine(chat_mode="context")

with trodo.wrap_agent("support-bot", conversation_id=session_id) as run:
    for msg in incoming_messages:
        response = chat_engine.chat(msg)
        run.set_output(response.response)

Each turn is one wrap_agent run; the conversation_id stitches them into the session view.

Retrieval span details

The default retriever populates:

span.input → the raw query string
span.output → list of { id, score, text_preview } for each retrieved node
span.attributes.top_k → configured similarity_top_k
span.attributes.doc_count → how many nodes actually came back

If you post-process (rerank, filter), each post-processor adds a child span so you can see how the doc set narrows.

Auto vs manual cheat-table

Operation	Auto?	If manual
`VectorIndexRetriever`	yes	—
Custom `BaseRetriever` subclass	partial	Must call `super()._retrieve()` for callbacks to fire; otherwise wrap in `trodo.retrieval()`
`QueryEngine.query`	yes	—
`ChatEngine.chat`	yes	—
`SubQuestionQueryEngine`	yes	Each sub-question becomes a nested query span
`ReactAgent` / `FunctionCallingAgent`	yes	Steps appear as alternating `llm` + `tool`
Custom node post-processors	yes	Name taken from the class
Ingestion pipeline	no	Usually offline — wrap in a separate run with `wrap_agent("ingest-docs")`

Gotchas

Only the Python SDK auto-instruments LlamaIndex. The JS port (llamaindex npm package) has no first-party OTel bindings — wrap retrievers with trodo.retrieval() and synthesiser calls with trodo.llm().
Token counts come from the underlying LLM (OpenAI, Anthropic). If you’re using a local LLM via CustomLLM, set extractUsage on a trodo.llm wrapper or call trodo.track_llm_call after each completion.
Streaming query engines (as_query_engine(streaming=True)) record the span when the stream finishes. Partial results appear only if you set span.set_output mid-stream.

GETTING STARTED

TRACING

INTEGRATIONS

What auto-instruments

Install

Minimum example — RAG query

Chat engine with memory

Retrieval span details

Auto vs manual cheat-table

Gotchas

GETTING STARTED

TRACING

INTEGRATIONS

Documentation Index

​What auto-instruments

​Install

​Minimum example — RAG query

​Chat engine with memory

​Retrieval span details

​Auto vs manual cheat-table

​Gotchas

What auto-instruments

Install

Minimum example — RAG query

Chat engine with memory

Retrieval span details

Auto vs manual cheat-table

Gotchas