Skip to content

Add Open Telemetry instrumentation#526

Merged
hadley merged 47 commits intomainfrom
otel
May 1, 2026
Merged

Add Open Telemetry instrumentation#526
hadley merged 47 commits intomainfrom
otel

Conversation

@atheriel
Copy link
Copy Markdown
Collaborator

This commit wraps all LLM model calls in an Open Telemetry span that abides by the (still nascent) semantic conventions for Generative AI clients.

It's very similar in approach to what was done for httr2, and in fact the two of them complement one another nicely:
r-lib/httr2#729.

For example:

library(otelsdk)

Sys.setenv(OTEL_TRACES_EXPORTER = "stderr")

chat <- ellmer::chat_databricks(model = "databricks-claude-3-7-sonnet")
chat$chat("Tell me a joke in the form of an SQL query.")

@atheriel atheriel requested review from gaborcsardi and hadley May 22, 2025 19:37
@atheriel
Copy link
Copy Markdown
Collaborator Author

Traces that mix LLM model call spans with httr2 spans:

Screenshot 2025-05-22 at 14-25-20 Live · posit1_starter-project · Pydantic Logfire

@jcheng5
Copy link
Copy Markdown
Collaborator

jcheng5 commented May 23, 2025

cc @cpsievert @schloerke @icarusz

@hadley
Copy link
Copy Markdown
Member

hadley commented May 28, 2025

Do we want to (optionally?) also include user and assistant messages, a la https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-events/ ?

@atheriel
Copy link
Copy Markdown
Collaborator Author

@hadley I do. But there's a ton of disagreement in the OTel LLM community about how to do that, and none of the existing instrumentation libraries work in the same way 😞. Plus the whole "structured body" mechanism the current spec proposes (1) isn't supported by the span API; and (2) is formally deprecated.

So I kind of think we need to noodle on what to do there, and I suggest pushing it into follow-up work. I'm planning on writing up an issue describing what options we have.

I also think we should have first-class support for tool call spans, because that's something that ellmer focuses on specifically. This PR is really the "basic" bit that the title implies.

@hadley
Copy link
Copy Markdown
Member

hadley commented May 28, 2025

@atheriel ok, that makes sense. I'm sure there will be a lot of learning as we figure out exactly what is most useful to instrument across packages.

@atheriel atheriel marked this pull request as draft June 6, 2025 17:59
@atheriel
Copy link
Copy Markdown
Collaborator Author

atheriel commented Jun 6, 2025

Moving this back to draft because it has known issues (i.e. the concurrency does not work correctly).

This commit instruments various operations with Open Telemetry spans that
abide by the (still nascent) semantic conventions for Generative AI
clients [0].

These conventions classify `ellmer` chatbots as "agents" due to their
ability to run tool calls, so in fact there are three types of span: (1)
a top-level `invoke_agent` span for each chat interaction; (2) `chat`
spans that wrap model API calls; and (3) `execute_tool` spans that wrap
tool calls on our end.

There's currently no community concensus for how to attach turns to
spans, so I've left that out for now.

Example code:

    library(otelsdk)

    Sys.setenv(OTEL_TRACES_EXPORTER = "stderr")

    chat <- ellmer::chat_databricks(model = "databricks-claude-3-7-sonnet")
    chat$chat("Tell me a joke in the form of an SQL query.")

Unit tests are included.

[0]: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/

Signed-off-by: Aaron Jacobs <[email protected]>
@atheriel atheriel changed the title Add basic Open Telemetry instrumentation for model calls Add Open Telemetry instrumentation Jun 20, 2025
@atheriel
Copy link
Copy Markdown
Collaborator Author

atheriel commented Jun 20, 2025

This has been updated to support async operations and for changes in otel and otelsdk. It now also includes pretty extensive unit tests and support for agent and tool call spans.

schloerke and others added 10 commits September 8, 2025 09:50
* main: (95 commits)
  fix(chat): Call `check_echo()` in `chat()` for consistent echo behavior (#742)
  Increment version number to 0.3.2.9000
  Increment version number to 0.3.2
  Don't run `content_image()` on CRAN (#739)
  feat(chat_): Add `params` and `model` to all `chat_` functions (#699)
  Fix spelling in `tool_prompt.md` (#730)
  Fix typos in source comments and regenerate documentation (#736)
  Fix news bullet
  Increment version number to 0.3.1.9000
  Increment version number to 0.3.1
  Update cran comments
  Check revdpes
  Re-build readme
  Typo fixes (#686)
  Polish news
  Update to latest Air settings and use `format-suggest.yaml` (#683)
  Use newer REST API base url (#726)
  Fix auth scope and API endpoints for Google Vertex (#704)
  Run `Rscript data-raw/prices.R` to update pricing info (#727)
  Improve error message for `batch_chat()` (#716)
  ...
Comment thread R/otel.R Outdated
@hadley
Copy link
Copy Markdown
Member

hadley commented Oct 14, 2025

One other question: do we want to log something about auth here (i.e. in particular, which auth was automatically picked?)

Comment thread R/otel.R Outdated
Comment thread R/otel.R Outdated
Comment thread R/otel.R Outdated
Comment thread DESCRIPTION Outdated
Comment thread DESCRIPTION Outdated
Updated tests to use expect_gte instead of strict length checks for tool and chat spans. This accounts for model variations where tools may be called more than expected or results are cached, improving test robustness across different model behaviors.
@schloerke
Copy link
Copy Markdown
Collaborator

The error was introduced in eeb0b84 (#526) 🥸

But!.. It's fixed now. Thank goodness for unit tests


Restrictions on the other earlier failing test have been relaxed and now pass

@schloerke schloerke marked this pull request as ready for review November 10, 2025 17:14
@schloerke schloerke requested review from hadley and shikokuchuo and removed request for hadley November 10, 2025 17:14
Copy link
Copy Markdown
Member

@shikokuchuo shikokuchuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @schloerke. I've made a couple more tidy-ups, the only consequential one to move otel to 'suggests', consistent with the other packages we're instrumenting.

Apart from the unit tests themsleves, I've also been testing using your shinychat demo (Shiny, ellmer, httr2 and mirai spans) and the traces all look good.

@shikokuchuo
Copy link
Copy Markdown
Member

@hadley this PR is now ready for review. Thanks!

@schloerke
Copy link
Copy Markdown
Collaborator

**Friendly poke to get this PR merged

# Conflicts:
#	R/chat.R
#	R/httr2.R
Comment thread R/chat-tools.R Outdated
@hadley
Copy link
Copy Markdown
Member

hadley commented Apr 29, 2026

Could we get a longish news bullet that describes what gets logged? And can someone check that https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/ hasn't changed in the almost one year (!!) that this PR has been open?

@shikokuchuo shikokuchuo requested a review from hadley April 30, 2026 11:03
Emits `gen_ai.usage.input_tokens` and `gen_ai.usage.output_tokens` per
the GenAI semantic conventions. Cleans up `value_tokens(ProviderAnthropic)`
to return raw integer counts (the 1.25x cache-write pricing weight now
lives in `value_turn` as a 0.25x surcharge, so cost is unchanged but
`token_usage()` reports honest token counts), with a regression test
locking the cost calculation. Aligns the other providers' `value_tokens()`
methods with the same `%||% 0` defensive pattern.

Also fixes a latent bug where the non-streaming sync/async paths passed
an unparsed httr2_response to the recorder, silently dropping
`gen_ai.response.model` and `gen_ai.response.id`. While here, factors
`resp_body_json()` parsing to a single call site so `acc$add_turn` now
takes parsed JSON, mirroring `acc$complete_turn`.
@shikokuchuo
Copy link
Copy Markdown
Member

d618297 adds input and output tokens as per the semantic conventions.

I had to move Anthropic's 1.25x multiplier out so it's just used for value calcs, and doesn't distort the actual token counts. Added a regression test for this.

Screenshot 2026-05-01 at 10 38 20

Opt-in via OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT, emits
gen_ai.input.messages, gen_ai.output.messages, and
gen_ai.system_instructions per the GenAI semantic conventions.
@shikokuchuo
Copy link
Copy Markdown
Member

c9e4e0e gives us: gen_ai.input.messages, gen_ai.output.messages, and gen_ai.system_instructions.

The semantic conventions require this to be opt-in (via the OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT env var) since these payloads may contain user data.

These unlock a special gen ai UI in Logfire so we get something nice like this:

Screenshot 2026-05-01 at 11 57 28

I've noticed a few rough edges where occasionally the UI will fall back to raw json output, but I don't think it's significant enough to stop us shipping an initial version of this functionality.

@hadley hadley merged commit 8590a74 into main May 1, 2026
11 checks passed
@hadley hadley deleted the otel branch May 1, 2026 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants