Conversation
|
Do we want to (optionally?) also include user and assistant messages, a la https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-events/ ? |
|
@hadley I do. But there's a ton of disagreement in the OTel LLM community about how to do that, and none of the existing instrumentation libraries work in the same way 😞. Plus the whole "structured body" mechanism the current spec proposes (1) isn't supported by the span API; and (2) is formally deprecated. So I kind of think we need to noodle on what to do there, and I suggest pushing it into follow-up work. I'm planning on writing up an issue describing what options we have. I also think we should have first-class support for tool call spans, because that's something that |
|
@atheriel ok, that makes sense. I'm sure there will be a lot of learning as we figure out exactly what is most useful to instrument across packages. |
|
Moving this back to draft because it has known issues (i.e. the concurrency does not work correctly). |
This commit instruments various operations with Open Telemetry spans that
abide by the (still nascent) semantic conventions for Generative AI
clients [0].
These conventions classify `ellmer` chatbots as "agents" due to their
ability to run tool calls, so in fact there are three types of span: (1)
a top-level `invoke_agent` span for each chat interaction; (2) `chat`
spans that wrap model API calls; and (3) `execute_tool` spans that wrap
tool calls on our end.
There's currently no community concensus for how to attach turns to
spans, so I've left that out for now.
Example code:
library(otelsdk)
Sys.setenv(OTEL_TRACES_EXPORTER = "stderr")
chat <- ellmer::chat_databricks(model = "databricks-claude-3-7-sonnet")
chat$chat("Tell me a joke in the form of an SQL query.")
Unit tests are included.
[0]: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/
Signed-off-by: Aaron Jacobs <[email protected]>
|
This has been updated to support async operations and for changes in |
* main: (95 commits) fix(chat): Call `check_echo()` in `chat()` for consistent echo behavior (#742) Increment version number to 0.3.2.9000 Increment version number to 0.3.2 Don't run `content_image()` on CRAN (#739) feat(chat_): Add `params` and `model` to all `chat_` functions (#699) Fix spelling in `tool_prompt.md` (#730) Fix typos in source comments and regenerate documentation (#736) Fix news bullet Increment version number to 0.3.1.9000 Increment version number to 0.3.1 Update cran comments Check revdpes Re-build readme Typo fixes (#686) Polish news Update to latest Air settings and use `format-suggest.yaml` (#683) Use newer REST API base url (#726) Fix auth scope and API endpoints for Google Vertex (#704) Run `Rscript data-raw/prices.R` to update pricing info (#727) Improve error message for `batch_chat()` (#716) ...
Co-Authored-By: Aaron Jacobs <[email protected]>
Co-Authored-By: Aaron Jacobs <[email protected]>
…n activation for chat
|
One other question: do we want to log something about auth here (i.e. in particular, which auth was automatically picked?) |
Co-authored-by: Charlie Gao <[email protected]>
Inspiration from shiny / promises code reviews
Updated tests to use expect_gte instead of strict length checks for tool and chat spans. This accounts for model variations where tools may be called more than expected or results are cached, improving test robustness across different model behaviors.
|
The error was introduced in But!.. It's fixed now. Thank goodness for unit tests Restrictions on the other earlier failing test have been relaxed and now pass |
shikokuchuo
left a comment
There was a problem hiding this comment.
Thanks @schloerke. I've made a couple more tidy-ups, the only consequential one to move otel to 'suggests', consistent with the other packages we're instrumenting.
Apart from the unit tests themsleves, I've also been testing using your shinychat demo (Shiny, ellmer, httr2 and mirai spans) and the traces all look good.
|
@hadley this PR is now ready for review. Thanks! |
|
**Friendly poke to get this PR merged |
# Conflicts: # R/chat.R # R/httr2.R
|
Could we get a longish news bullet that describes what gets logged? And can someone check that https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/ hasn't changed in the almost one year (!!) that this PR has been open? |
Emits `gen_ai.usage.input_tokens` and `gen_ai.usage.output_tokens` per the GenAI semantic conventions. Cleans up `value_tokens(ProviderAnthropic)` to return raw integer counts (the 1.25x cache-write pricing weight now lives in `value_turn` as a 0.25x surcharge, so cost is unchanged but `token_usage()` reports honest token counts), with a regression test locking the cost calculation. Aligns the other providers' `value_tokens()` methods with the same `%||% 0` defensive pattern. Also fixes a latent bug where the non-streaming sync/async paths passed an unparsed httr2_response to the recorder, silently dropping `gen_ai.response.model` and `gen_ai.response.id`. While here, factors `resp_body_json()` parsing to a single call site so `acc$add_turn` now takes parsed JSON, mirroring `acc$complete_turn`.
|
d618297 adds input and output tokens as per the semantic conventions. I had to move Anthropic's 1.25x multiplier out so it's just used for value calcs, and doesn't distort the actual token counts. Added a regression test for this.
|
Opt-in via OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT, emits gen_ai.input.messages, gen_ai.output.messages, and gen_ai.system_instructions per the GenAI semantic conventions.
|
c9e4e0e gives us: The semantic conventions require this to be opt-in (via the These unlock a special gen ai UI in Logfire so we get something nice like this:
I've noticed a few rough edges where occasionally the UI will fall back to raw json output, but I don't think it's significant enough to stop us shipping an initial version of this functionality. |



This commit wraps all LLM model calls in an Open Telemetry span that abides by the (still nascent) semantic conventions for Generative AI clients.
It's very similar in approach to what was done for
httr2, and in fact the two of them complement one another nicely:r-lib/httr2#729.
For example: