Add Open Telemetry instrumentation by atheriel · Pull Request #526 · tidyverse/ellmer

atheriel · 2025-05-22T19:37:21Z

This commit wraps all LLM model calls in an Open Telemetry span that abides by the (still nascent) semantic conventions for Generative AI clients.

It's very similar in approach to what was done for httr2, and in fact the two of them complement one another nicely:
r-lib/httr2#729.

For example:

library(otelsdk)

Sys.setenv(OTEL_TRACES_EXPORTER = "stderr")

chat <- ellmer::chat_databricks(model = "databricks-claude-3-7-sonnet")
chat$chat("Tell me a joke in the form of an SQL query.")

atheriel · 2025-05-22T19:53:07Z

Traces that mix LLM model call spans with httr2 spans:

jcheng5 · 2025-05-23T00:58:21Z

cc @cpsievert @schloerke @icarusz

hadley · 2025-05-28T15:37:27Z

Do we want to (optionally?) also include user and assistant messages, a la https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-events/ ?

atheriel · 2025-05-28T15:45:03Z

@hadley I do. But there's a ton of disagreement in the OTel LLM community about how to do that, and none of the existing instrumentation libraries work in the same way 😞. Plus the whole "structured body" mechanism the current spec proposes (1) isn't supported by the span API; and (2) is formally deprecated.

So I kind of think we need to noodle on what to do there, and I suggest pushing it into follow-up work. I'm planning on writing up an issue describing what options we have.

I also think we should have first-class support for tool call spans, because that's something that ellmer focuses on specifically. This PR is really the "basic" bit that the title implies.

hadley · 2025-05-28T16:10:02Z

@atheriel ok, that makes sense. I'm sure there will be a lot of learning as we figure out exactly what is most useful to instrument across packages.

atheriel · 2025-06-06T17:59:47Z

Moving this back to draft because it has known issues (i.e. the concurrency does not work correctly).

This commit instruments various operations with Open Telemetry spans that abide by the (still nascent) semantic conventions for Generative AI clients [0]. These conventions classify `ellmer` chatbots as "agents" due to their ability to run tool calls, so in fact there are three types of span: (1) a top-level `invoke_agent` span for each chat interaction; (2) `chat` spans that wrap model API calls; and (3) `execute_tool` spans that wrap tool calls on our end. There's currently no community concensus for how to attach turns to spans, so I've left that out for now. Example code: library(otelsdk) Sys.setenv(OTEL_TRACES_EXPORTER = "stderr") chat <- ellmer::chat_databricks(model = "databricks-claude-3-7-sonnet") chat$chat("Tell me a joke in the form of an SQL query.") Unit tests are included. [0]: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/ Signed-off-by: Aaron Jacobs <[email protected]>

atheriel · 2025-06-20T20:36:42Z

This has been updated to support async operations and for changes in otel and otelsdk. It now also includes pretty extensive unit tests and support for agent and tool call spans.

* main: (95 commits) fix(chat): Call `check_echo()` in `chat()` for consistent echo behavior (#742) Increment version number to 0.3.2.9000 Increment version number to 0.3.2 Don't run `content_image()` on CRAN (#739) feat(chat_): Add `params` and `model` to all `chat_` functions (#699) Fix spelling in `tool_prompt.md` (#730) Fix typos in source comments and regenerate documentation (#736) Fix news bullet Increment version number to 0.3.1.9000 Increment version number to 0.3.1 Update cran comments Check revdpes Re-build readme Typo fixes (#686) Polish news Update to latest Air settings and use `format-suggest.yaml` (#683) Use newer REST API base url (#726) Fix auth scope and API endpoints for Google Vertex (#704) Run `Rscript data-raw/prices.R` to update pricing info (#727) Improve error message for `batch_chat()` (#716) ...

Co-Authored-By: Aaron Jacobs <[email protected]>

…o check remotes

Co-Authored-By: Aaron Jacobs <[email protected]>

…n activation for chat

hadley · 2025-10-14T13:32:42Z

One other question: do we want to log something about auth here (i.e. in particular, which auth was automatically picked?)

Co-authored-by: Charlie Gao <[email protected]>

Inspiration from shiny / promises code reviews

Updated tests to use expect_gte instead of strict length checks for tool and chat spans. This accounts for model variations where tools may be called more than expected or results are cached, improving test robustness across different model behaviors.

schloerke · 2025-11-10T17:02:59Z

The error was introduced in eeb0b84 (#526) 🥸

But!.. It's fixed now. Thank goodness for unit tests

Restrictions on the other earlier failing test have been relaxed and now pass

… absolute values

shikokuchuo

Thanks @schloerke. I've made a couple more tidy-ups, the only consequential one to move otel to 'suggests', consistent with the other packages we're instrumenting.

Apart from the unit tests themsleves, I've also been testing using your shinychat demo (Shiny, ellmer, httr2 and mirai spans) and the traces all look good.

shikokuchuo · 2025-11-10T19:50:08Z

@hadley this PR is now ready for review. Thanks!

schloerke · 2026-04-20T20:40:00Z

**Friendly poke to get this PR merged

# Conflicts: # R/chat.R # R/httr2.R

hadley · 2026-04-29T19:33:48Z

Could we get a longish news bullet that describes what gets logged? And can someone check that https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/ hasn't changed in the almost one year (!!) that this PR has been open?

Emits `gen_ai.usage.input_tokens` and `gen_ai.usage.output_tokens` per the GenAI semantic conventions. Cleans up `value_tokens(ProviderAnthropic)` to return raw integer counts (the 1.25x cache-write pricing weight now lives in `value_turn` as a 0.25x surcharge, so cost is unchanged but `token_usage()` reports honest token counts), with a regression test locking the cost calculation. Aligns the other providers' `value_tokens()` methods with the same `%||% 0` defensive pattern. Also fixes a latent bug where the non-streaming sync/async paths passed an unparsed httr2_response to the recorder, silently dropping `gen_ai.response.model` and `gen_ai.response.id`. While here, factors `resp_body_json()` parsing to a single call site so `acc$add_turn` now takes parsed JSON, mirroring `acc$complete_turn`.

shikokuchuo · 2026-05-01T09:56:24Z

d618297 adds input and output tokens as per the semantic conventions.

I had to move Anthropic's 1.25x multiplier out so it's just used for value calcs, and doesn't distort the actual token counts. Added a regression test for this.

Opt-in via OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT, emits gen_ai.input.messages, gen_ai.output.messages, and gen_ai.system_instructions per the GenAI semantic conventions.

shikokuchuo · 2026-05-01T11:31:47Z

c9e4e0e gives us: gen_ai.input.messages, gen_ai.output.messages, and gen_ai.system_instructions.

The semantic conventions require this to be opt-in (via the OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT env var) since these payloads may contain user data.

These unlock a special gen ai UI in Logfire so we get something nice like this:

I've noticed a few rough edges where occasionally the UI will fall back to raw json output, but I don't think it's significant enough to stop us shipping an initial version of this functionality.

atheriel requested review from gaborcsardi and hadley May 22, 2025 19:37

atheriel marked this pull request as draft June 6, 2025 17:59

atheriel force-pushed the otel branch from 1e41be2 to 6429125 Compare June 20, 2025 20:34

atheriel changed the title ~~Add basic Open Telemetry instrumentation for model calls~~ Add Open Telemetry instrumentation Jun 20, 2025

schloerke and others added 10 commits September 8, 2025 09:50

usethis::use_tidy_description()

a710ce2

Updates to the latest promises ospans

bf51c67

Co-Authored-By: Aaron Jacobs <[email protected]>

Simplify internal api

2d5245a

Use promises::local_ospan_promise_domain()

397052e

Import otel as promises does. Remove suggestions on otelsdk and add t…

833c308

…o check remotes

Use new promises main branch (PR was merged)

324c955

Copy in tracer retrieval from httr2

a923a1a

Co-Authored-By: Aaron Jacobs <[email protected]>

Fix runtime error with otelsdk where spn$end(status="auto") would fail

bc00cc2

Pass through the parent chat ospan to the generator methods. Add ospa…

e59614b

…n activation for chat

schloerke reviewed Sep 18, 2025

View reviewed changes

Comment thread R/otel.R Outdated

shikokuchuo reviewed Oct 23, 2025

View reviewed changes

Comment thread R/otel.R Outdated

Comment thread R/otel.R Outdated

Comment thread R/otel.R Outdated

Comment thread DESCRIPTION Outdated

Comment thread DESCRIPTION Outdated

schloerke and others added 4 commits November 6, 2025 12:07

Apply suggestions from code review

5e48b66

Co-authored-by: Charlie Gao <[email protected]>

Merge branch 'main' into otel

0de30c1

Use existing is_testing() method

5735bc6

Use more descriptive otel tracer function for ellmer

30ee794

Inspiration from shiny / promises code reviews

schloerke marked this pull request as ready for review November 10, 2025 17:14

schloerke requested review from hadley and shikokuchuo and removed request for hadley November 10, 2025 17:14

shikokuchuo added 4 commits November 10, 2025 18:12

Refactor otel spans tests to be relative instead of comparing against…

2d78a24

… absolute values

Simplify span kinds and test

7f653bb

Move otel to suggests

c03dfde

Add span_recording() helper

ecf85fd

shikokuchuo approved these changes Nov 10, 2025

View reviewed changes

shikokuchuo added 3 commits November 14, 2025 15:12

Merge branch 'main' into otel

34d8093

Merge branch 'main' into otel

128a7b5

Merge branch 'main' into otel

35659d1

Merge remote-tracking branch 'upstream/main' into otel

6a7b604

# Conflicts: # R/chat.R # R/httr2.R

hadley reviewed Apr 29, 2026

View reviewed changes

Comment thread R/chat-tools.R Outdated

shikokuchuo added 4 commits April 30, 2026 11:26

Apply suggestions from @hadley

3546c93

Update to match latest semantic conventions

9ee3fad

Add comprehensive news bullet

02a321d

Merge branch 'main' into otel

2470ae1

shikokuchuo requested a review from hadley April 30, 2026 11:03

Capture conversation content on OTEL chat spans

c9e4e0e

Opt-in via OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT, emits gen_ai.input.messages, gen_ai.output.messages, and gen_ai.system_instructions per the GenAI semantic conventions.

hadley merged commit 8590a74 into main May 1, 2026
11 checks passed

hadley deleted the otel branch May 1, 2026 12:25

Conversation

atheriel commented May 22, 2025

Uh oh!

atheriel commented May 22, 2025

Uh oh!

jcheng5 commented May 23, 2025

Uh oh!

hadley commented May 28, 2025

Uh oh!

atheriel commented May 28, 2025

Uh oh!

hadley commented May 28, 2025

Uh oh!

atheriel commented Jun 6, 2025

Uh oh!

atheriel commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

hadley commented Oct 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

schloerke commented Nov 10, 2025

Uh oh!

shikokuchuo left a comment

Choose a reason for hiding this comment

Uh oh!

shikokuchuo commented Nov 10, 2025

Uh oh!

schloerke commented Apr 20, 2026

Uh oh!

Uh oh!

hadley commented Apr 29, 2026

Uh oh!

shikokuchuo commented May 1, 2026

Uh oh!

shikokuchuo commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

atheriel commented Jun 20, 2025 •

edited

Loading