Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
89acfcb
Moved dataset from the academy repo
ElenaKhaustova Mar 16, 2026
0b48501
Adapted dataset to the repo
ElenaKhaustova Mar 16, 2026
784ab23
Updated requirements
ElenaKhaustova Mar 16, 2026
b856455
Added unit tests
ElenaKhaustova Mar 16, 2026
0fafdcb
Extended readme
ElenaKhaustova Mar 17, 2026
463e422
Added docs
ElenaKhaustova Mar 17, 2026
928740f
Updated nav
ElenaKhaustova Mar 17, 2026
ef24144
Updated release notes
ElenaKhaustova Mar 17, 2026
9d06d46
Restored alphabetical order
ElenaKhaustova Mar 17, 2026
bd1d182
Converted method to static
ElenaKhaustova Mar 17, 2026
ee71498
Fixed linter
ElenaKhaustova Mar 17, 2026
816799b
Fixed ruff
ElenaKhaustova Mar 17, 2026
ed4e516
Fixed version normalisation for python3.10
ElenaKhaustova Mar 17, 2026
543ee70
Updated sync modes
ElenaKhaustova Mar 18, 2026
c738257
Updated unit tests
ElenaKhaustova Mar 18, 2026
674e21b
Updated readme
ElenaKhaustova Mar 18, 2026
fb130c4
Fixed test on Windows
ElenaKhaustova Mar 18, 2026
29a01ee
Fixed links in the docs
ElenaKhaustova Mar 18, 2026
631f020
Merge branch 'main' into feat/langfuse-evaluation-dataset
ravi-kumar-pilla Mar 23, 2026
999517b
Merge branch 'main' into feat/langfuse-evaluation-dataset
ravi-kumar-pilla Mar 23, 2026
d5daca2
Moved validation upper
ElenaKhaustova Mar 24, 2026
c34af87
Fixed validation order
ElenaKhaustova Mar 24, 2026
6924ce3
Added unit tests checking the updted logic
ElenaKhaustova Mar 24, 2026
5c9ae45
Merge branch 'feat/langfuse-evaluation-dataset' into feat/langfuse-ev…
ElenaKhaustova Mar 24, 2026
3befc54
Merge branch 'main' into feat/langfuse-evaluation-dataset
ElenaKhaustova Mar 27, 2026
3f042e5
Merge branch 'main' into feat/langfuse-evaluation-dataset
ElenaKhaustova Mar 31, 2026
f884b97
Clarified docstrings
ElenaKhaustova Mar 31, 2026
fb207c5
Added table with datasets to readme
ElenaKhaustova Mar 31, 2026
e92b393
Merge branch 'main' into feat/langfuse-evaluation-dataset
ElenaKhaustova Mar 31, 2026
66b8920
Updated extras names
ElenaKhaustova Mar 31, 2026
9c9e54b
Updated release notes
ElenaKhaustova Mar 31, 2026
361dd46
Merge branch 'main' into feat/rename-genai-extras
ElenaKhaustova Apr 1, 2026
7f43c8e
Merge branch 'main' into feat/rename-genai-extras
ElenaKhaustova Apr 22, 2026
4f443ce
Shorten pyproject.toml extra names for langfuse, opik, and langchain
ElenaKhaustova Apr 22, 2026
0ddaea4
Rename langfuse dataset files and classes to drop redundant prefix
ElenaKhaustova Apr 22, 2026
436f472
Rename opik dataset files and classes to drop redundant prefix
ElenaKhaustova Apr 22, 2026
fc70ca2
Rename langchain dataset file and class to drop redundant prefix
ElenaKhaustova Apr 22, 2026
638c4f8
Update API docs and release notes for renamed experimental datasets
ElenaKhaustova Apr 22, 2026
3b70828
Merge branch 'main' into feat/rename-genai-extras
ElenaKhaustova Apr 23, 2026
a50d75b
Fixed release notes
ElenaKhaustova Apr 23, 2026
2491a93
Improved format of release notes
ElenaKhaustova Apr 23, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 21 additions & 6 deletions kedro-datasets/RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,27 @@

| Type | Description | Location |
| ------------------------------------ | ---------------------------------------------------- | -------------------------------------- |
| `opik.OpikEvaluationDataset` | A dataset for managing Opik evaluation datasets. | `kedro_datasets_experimental.opik` |

## Bug fixes and other changes

- Refactored shared validation and utility logic from the three Opik experimental datasets (`OpikPromptDataset`, `OpikEvaluationDataset`, `OpikTraceDataset`) into a common `opik._common` module.
- Refactored shared validation and utility logic from the three Langfuse experimental datasets (`LangfusePromptDataset`, `LangfuseEvaluationDataset`, `LangfuseTraceDataset`) into a common `langfuse._common` module.
| `opik.EvaluationDataset` | A dataset for managing Opik evaluation datasets. | `kedro_datasets_experimental.opik` |

## Breaking changes to experimental datasets
- Renamed dataset classes and shortened `pyproject.toml` extra names for `langfuse`, `opik`, and `langchain` experimental datasets. The redundant package-family prefix has been dropped:
- Classes:
- `langfuse.LangfusePromptDataset` → `langfuse.PromptDataset`
- `langfuse.LangfuseTraceDataset` → `langfuse.TraceDataset`
- `langfuse.LangfuseEvaluationDataset` → `langfuse.EvaluationDataset`
- `opik.OpikPromptDataset` → `opik.PromptDataset`
- `opik.OpikTraceDataset` → `opik.TraceDataset`
- `langchain.LangChainPromptDataset` → `langchain.PromptDataset`
- Extras:
- `langfuse-langfusepromptdataset` → `langfuse-promptdataset`
- `opik-opiktracedataset` → `opik-tracedataset`
- `langchain-langchainpromptdataset` → `langchain-promptdataset`
- etc.

## Bug fixes and other changes

- Refactored shared validation and utility logic from the three Opik experimental datasets (`PromptDataset`, `EvaluationDataset`, `TraceDataset`) into a common `opik._common` module.
- Refactored shared validation and utility logic from the three Langfuse experimental datasets (`PromptDataset`, `EvaluationDataset`, `TraceDataset`) into a common `langfuse._common` module.
- Added `os.PathLike` support for `plotly` datasets.

## Community contributions
Expand Down
14 changes: 7 additions & 7 deletions kedro-datasets/docs/api/kedro_datasets_experimental/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,18 @@ Name | Description
-----|------------
[chromadb.ChromaDBDataset](chromadb.ChromaDBDataset.md) | ``ChromaDBDataset`` loads and saves data to ChromaDB vector database collections.
[databricks.ExternalTableDataset](databricks.ExternalTableDataset.md) | ``ExternalTableDataset`` implementation to access external tables in Databricks.
[langchain.LangChainPromptDataset](langchain.LangChainPromptDataset.md) | ``LangChainPromptDataset`` loads a `langchain` prompt template.
[langfuse.LangfuseEvaluationDataset](langfuse.LangfuseEvaluationDataset.md) | ``LangfuseEvaluationDataset`` manages Langfuse evaluation datasets for LLM experiment workflows, supporting local file syncing and remote dataset versioning.
[langfuse.LangfusePromptDataset](langfuse.LangfusePromptDataset.md) | ``LangfusePromptDataset`` provides a seamless integration between local prompt files (JSON/YAML) and Langfuse prompt management, supporting version control, labeling, and different synchronization policies.
[langfuse.LangfuseTraceDataset](langfuse.LangfuseTraceDataset.md) | ``LangfuseTraceDataset`` provides Langfuse tracing clients for LLM observability and monitoring.
[langchain.PromptDataset](langchain.PromptDataset.md) | ``PromptDataset`` loads a `langchain` prompt template.
[langfuse.EvaluationDataset](langfuse.EvaluationDataset.md) | ``EvaluationDataset`` manages Langfuse evaluation datasets for LLM experiment workflows, supporting local file syncing and remote dataset versioning.
[langfuse.PromptDataset](langfuse.PromptDataset.md) | ``PromptDataset`` provides a seamless integration between local prompt files (JSON/YAML) and Langfuse prompt management, supporting version control, labeling, and different synchronization policies.
[langfuse.TraceDataset](langfuse.TraceDataset.md) | ``TraceDataset`` provides Langfuse tracing clients for LLM observability and monitoring.
[mlrun.MLRunAbstractDataset](mlrun.MLRunAbstractDataset.md) | ``MLRunAbstractDataset`` base class for MLRun datasets, can be used directly for generic artifacts.
[mlrun.MLRunModel](mlrun.MLRunModel.md) | ``MLRunModel`` saves and loads ML models via MLRun with framework metadata and configurable file format.
[mlrun.MLRunDataframeDataset](mlrun.MLRunDataframeDataset.md) | ``MLRunDataframeDataset`` saves and loads pandas DataFrames as MLRun artifacts.
[mlrun.MLRunResult](mlrun.MLRunResult.md) | ``MLRunResult`` logs scalar results and metrics to MLRun with optional nested dict flattening.
[netcdf.NetCDFDataset](netcdf.NetCDFDataset.md) | ``NetCDFDataset`` loads/saves data from/to a NetCDF file using an underlying filesystem (e.g.: local, S3, GCS). It uses xarray to handle the NetCDF file.
[opik.OpikEvaluationDataset](opik.OpikEvaluationDataset.md) | ``OpikEvaluationDataset`` manages Opik evaluation datasets for LLM experiment workflows.
[opik.OpikPromptDataset](opik.OpikPromptDataset.md) | ``OpikPromptDataset`` manages prompts with Opik versioning support, returning either raw SDK objects or LangChain templates.
[opik.OpikTraceDataset](opik.OpikTraceDataset.md) | ``OpikTraceDataset`` provides Opik tracing clients for observability and monitoring.
[opik.EvaluationDataset](opik.EvaluationDataset.md) | ``EvaluationDataset`` manages Opik evaluation datasets for LLM experiment workflows.
[opik.PromptDataset](opik.PromptDataset.md) | ``PromptDataset`` manages prompts with Opik versioning support, returning either raw SDK objects or LangChain templates.
[opik.TraceDataset](opik.TraceDataset.md) | ``TraceDataset`` provides Opik tracing clients for observability and monitoring.
[optuna.StudyDataset](optuna.StudyDataset.md) | ``StudyDataset`` loads/saves an Optuna study, enabling distributed hyperparameter tuning.
[pypdf.PDFDataset](pypdf.PDFDataset.md) | ``PDFDataset`` loads data from PDF files using pypdf to extract text from pages. Read-only dataset.
[polars.PolarsDatabaseDataset](polars.PolarsDatabaseDataset.md) | ``PolarsDatabaseDataset`` implementation to access databases as Polars DataFrames. It supports reading from a SQL query and writing to a database table.
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
::: kedro_datasets_experimental.opik.OpikEvaluationDataset
::: kedro_datasets_experimental.langchain.PromptDataset
options:
members: true
show_source: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
::: kedro_datasets_experimental.langfuse.EvaluationDataset
options:
members: true
show_source: true

This file was deleted.

This file was deleted.

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
::: kedro_datasets_experimental.opik.OpikPromptDataset
::: kedro_datasets_experimental.langfuse.PromptDataset
options:
members: true
show_source: true
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
::: kedro_datasets_experimental.opik.OpikTraceDataset
::: kedro_datasets_experimental.langfuse.TraceDataset
options:
members: true
show_source: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
::: kedro_datasets_experimental.opik.EvaluationDataset
options:
members: true
show_source: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
::: kedro_datasets_experimental.opik.PromptDataset
options:
members: true
show_source: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
::: kedro_datasets_experimental.opik.TraceDataset
options:
members: true
show_source: true
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,16 @@
import lazy_loader as lazy

try:
from .langchain_prompt_dataset import LangChainPromptDataset
from .prompt_dataset import PromptDataset

except (ImportError, RuntimeError):
# For documentation builds that might fail due to dependency issues
# https://github.com/pylint-dev/pylint/issues/4300#issuecomment-1043601901
LangChainPromptDataset: Any
PromptDataset: Any

__getattr__, __dir__, __all__ = lazy.attach(
__name__,
submod_attrs={
"langchain_prompt_dataset": ["LangChainPromptDataset"],
"prompt_dataset": ["PromptDataset"],
},
)
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
from kedro_datasets._typing import JSONPreview


class LangChainPromptDataset(AbstractDataset[Union[PromptTemplate, ChatPromptTemplate], Any]): # noqa UP007
class PromptDataset(AbstractDataset[Union[PromptTemplate, ChatPromptTemplate], Any]): # noqa UP007
"""
A Kedro dataset for loading LangChain prompt templates from text, JSON, or YAML files.

Expand All @@ -29,7 +29,7 @@ class LangChainPromptDataset(AbstractDataset[Union[PromptTemplate, ChatPromptTem
### Example usage for the [YAML API](https://docs.kedro.org/en/stable/catalog-data/data_catalog_yaml_examples/):
```yaml
my_prompt:
type: kedro_datasets_experimental.langchain.LangChainPromptDataset
type: kedro_datasets_experimental.langchain.PromptDataset
filepath: data/prompts/my_prompt.json
template: PromptTemplate
dataset:
Expand All @@ -47,9 +47,9 @@ class LangChainPromptDataset(AbstractDataset[Union[PromptTemplate, ChatPromptTem

### Example usage for the [Python API](https://docs.kedro.org/en/stable/catalog-data/advanced_data_catalog_usage/):
```python
from kedro_datasets_experimental.langchain import LangChainPromptDataset
from kedro_datasets_experimental.langchain import PromptDataset

dataset = LangChainPromptDataset(
dataset = PromptDataset(
filepath="data/prompts/my_prompt.json",
template="PromptTemplate",
dataset={"type": "json.JSONDataset"},
Expand Down Expand Up @@ -294,7 +294,7 @@ def _create_chat_prompt_template(self, data: dict | list[tuple[str, str]]) -> Ch
return ChatPromptTemplate.from_messages(messages)

def save(self, data: Any) -> None:
raise DatasetError("Saving is not supported for LangChainPromptDataset")
raise DatasetError("Saving is not supported for PromptDataset")

def _describe(self) -> dict[str, Any]:
clean_config = {
Expand Down
Loading
Loading