Skip to content

feat(datasets): Extend LangfuseTraceDataset to support AutoGen tracing#1288

Merged
SajidAlamQB merged 30 commits into
mainfrom
feat/add-auto-gen-support-to-langfuse
Feb 10, 2026
Merged

feat(datasets): Extend LangfuseTraceDataset to support AutoGen tracing#1288
SajidAlamQB merged 30 commits into
mainfrom
feat/add-auto-gen-support-to-langfuse

Conversation

@SajidAlamQB
Copy link
Copy Markdown
Contributor

@SajidAlamQB SajidAlamQB commented Jan 20, 2026

Description

Related to: #1276

To test for QA use the kedro-academy example: kedro-org/kedro-academy#104

Adds autogen mode to LangfuseTraceDataset, enabling OpenTelemetry based tracing for AutoGen agent pipelines via Langfuse's OTLP endpoint.

Development notes

  • Added autogen mode to LangfuseTraceDataset that returns a configured OpenTelemetry Tracer
  • _build_autogen_tracer() sets up an OTLP exporter

Checklist

  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Updated jsonschema/kedro-catalog-X.XX.json if necessary
  • Added a description of this change in the relevant RELEASE.md file
  • Added tests to cover my changes
  • Received approvals from at least half of the TSC (required for adding a new, non-experimental dataset)

Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
@SajidAlamQB SajidAlamQB changed the title Extend LangfuseTraceDataset to support AutoGen tracing feat(datasets): Extend LangfuseTraceDataset to support AutoGen tracing Jan 20, 2026
@SajidAlamQB SajidAlamQB marked this pull request as ready for review January 21, 2026 15:51
Copy link
Copy Markdown
Contributor

@ElenaKhaustova ElenaKhaustova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a small comment, other than that he implementation looks good 👍

Could you please also open a PR in the academy project applying autogent mode to this pipeline https://github.com/kedro-org/kedro-academy/tree/main/kedro-agentic-workflows/src/kedro_agentic_workflows/pipelines/response_generation_autogen so it it easy to test for reviewers?

Also don't forget to update RELEASE.md

@ravi-kumar-pilla
Copy link
Copy Markdown
Contributor

Hi @SajidAlamQB , The implementation looks good. It would be nice to have some QA steps or as Elena mentioned some way to test this out, would be cool. Thank you

Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Comment thread kedro-datasets/pyproject.toml
Comment thread kedro-datasets/kedro_datasets_experimental/langfuse/langfuse_trace_dataset.py Outdated
Comment thread kedro-datasets/kedro_datasets_experimental/langfuse/langfuse_trace_dataset.py Outdated
Copy link
Copy Markdown
Contributor

@ElenaKhaustova ElenaKhaustova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need some help to clarify how to install: kedro-org/kedro-academy#104 (review)

Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Copy link
Copy Markdown
Contributor

@ElenaKhaustova ElenaKhaustova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @SajidAlamQB, changes made make sense to me! I left a suggestion regarding the implementation.

I tested it with the academy project, and it works now. I left a question regarding the warning produced kedro-org/kedro-academy#104 (review)

And another general question is regarding the OTLP approach we chose. Is it because we try to align with the autogen mode implementation for OpikTraceDataset? Otherwise, this approach (https://langfuse.com/integrations/frameworks/autogen) looks much easier and requires only configuration through Langfuse, as we already do for other modes.

I also wonder what the difference is between those two approaches in terms of the end result, and if you had a chance to explore it?

Comment thread kedro-datasets/kedro_datasets_experimental/langfuse/langfuse_trace_dataset.py Outdated
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
@SajidAlamQB
Copy link
Copy Markdown
Contributor Author

And another general question is regarding the OTLP approach we chose. Is it because we try to align with the autogen mode implementation for OpikTraceDataset? Otherwise, this approach (https://langfuse.com/integrations/frameworks/autogen) looks much easier and requires only configuration through Langfuse, as we already do for other modes.

I also wonder what the difference is between those two approaches in terms of the end result, and if you had a chance to explore it?

Yes the main reason for OTLP approach was to keep consistent with Opik which didn't have an equivalent, so its autogen mode uses OTLP directly.

I think for initial implementation it makes sense to keep OTLP for consistency, but we could explore adding an openlit mode or enhancing the autogen mode for Langfuse specifically in a follow up if those other features are needed.

For the endpoint that makes sense I'll make it configurable.

@ElenaKhaustova
Copy link
Copy Markdown
Contributor

@SajidAlamQB

what the difference is between those two approaches in terms of the end result?

I mean, is there any notable difference at all, aside from the configuration?

@SajidAlamQB
Copy link
Copy Markdown
Contributor Author

@SajidAlamQB

what the difference is between those two approaches in terms of the end result?

I mean, is there any notable difference at all, aside from the configuration?

The openLit approach just gives more detailed traces out of the box but otherwise not much difference tbh.

Signed-off-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com>
Comment thread kedro-datasets/kedro_datasets_experimental/langfuse/langfuse_trace_dataset.py Outdated
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Copy link
Copy Markdown
Contributor

@ElenaKhaustova ElenaKhaustova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @SajidAlamQB!

I've unresolved the comment about the endpoint as it does not seem to be solved. Also added a few suggestions on how it can be done.

Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
@ravi-kumar-pilla
Copy link
Copy Markdown
Contributor

Hi @SajidAlamQB ,

The code looks good and it works well with the test project in kedro-academy. We need to change how credentials are handled in other modes (either in this PR or a separate one to be consistent)

Thank you

Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Copy link
Copy Markdown
Contributor

@ankatiyar ankatiyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good overall, I'll let Elena and Ravi do the final approvals :)

SajidAlamQB and others added 7 commits February 5, 2026 15:38
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
@SajidAlamQB
Copy link
Copy Markdown
Contributor Author

SajidAlamQB commented Feb 9, 2026

Hey team so this PR went through a few different iteration so just to make it clear:

We explored two approaches for AutoGen tracing with Langfuse:

Approach 1: OpenLit (attempted, reverted)
Tried using OpenLit for as shown in Langfuse's AutoGen tutorial

Trace hierarchy was breaking without manual spans and without wrapping agent calls in tracer.start_as_current_span(), each AutoGen operation became a separate trace at depth 0 instead of nested under a parent.

Graph visualisation issues: Even with correct trace hierarchy Langfuse's graph view renders multi-agent workflows incorrectly. This is a known Langfuse limitation (see issues below).

Approach 2: OTLP (current implementation)
Reverted to direct OpenTelemetry OTLP export which sas no additional dependencies beyond opentelemetry-sdk and opentelemetry-exporter-otlp-proto-http

Provides stable API and aligns with opik setup and produces correct trace structure

I've added a note in the docstring that Langfuse's graph visualisation is in beta and may not render complex multi-agent workflows correctly. Also opened an issue on their side:

langfuse/langfuse#11941

Other Related issues:

langfuse/langfuse#9427
langfuse/langfuse#10721
langfuse/langfuse#9648

Copy link
Copy Markdown
Contributor

@ElenaKhaustova ElenaKhaustova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @SajidAlamQB, implementation looks good to me!

One minor thing that I've noticed is that docs are not rendered properly:

Image

SajidAlamQB and others added 3 commits February 10, 2026 12:48
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
…b.com/kedro-org/kedro-plugins into feat/add-auto-gen-support-to-langfuse

Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
@SajidAlamQB SajidAlamQB merged commit 33364c9 into main Feb 10, 2026
28 checks passed
@SajidAlamQB SajidAlamQB deleted the feat/add-auto-gen-support-to-langfuse branch February 10, 2026 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants