Skip to content

fix(datasets): Explicitly convert paths to str in ibis.FileDataset#1380

Open
lrcouto wants to merge 3 commits intomainfrom
ibis-dataset-fails-parquet
Open

fix(datasets): Explicitly convert paths to str in ibis.FileDataset#1380
lrcouto wants to merge 3 commits intomainfrom
ibis-dataset-fails-parquet

Conversation

@lrcouto
Copy link
Copy Markdown
Contributor

@lrcouto lrcouto commented Apr 17, 2026

Description

Fix for #1378.

ibis.FileDataset passes PurePosixPath objects directly to ibis backend reader/writer methods. The Polars backend's read_parquet internally calls len() on the path argument (to distinguish a single path from a list of paths), which fails because PurePosixPath has no such method:

kedro.io.core.DatasetError: Failed while loading data from dataset kedro_datasets.ibis.file_dataset.FileDataset(filepath=PurePosixPath('/tmp/tmpwkw44st1/output.parquet'), file_format='parquet', table_name='output', backend='polars', load_args={}, save_args={}).
object of type 'PurePosixPath' has no len()

Explicitly converted load and save paths to str before passing them to the backend. This is the lowest-common-denominator type accepted by all ibis backends and aligns with the workaround suggested by the community member who reported the issue.

Development notes

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Updated jsonschema/kedro-catalog-X.XX.json if necessary
  • Added a description of this change in the relevant RELEASE.md file
  • Added tests to cover my changes
  • Received approvals from at least half of the TSC (required for adding a new, non-experimental dataset)

Signed-off-by: Laura Couto <laurarccouto@gmail.com>
@deepyaman deepyaman self-requested a review April 18, 2026 15:22
Copy link
Copy Markdown
Member

@deepyaman deepyaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right fix. I'll comment on the original issue.

Update: It's too harsh to say this isn't the right fix; it's probably a valid fix given the current issues with ibis.FileDataset. I still think converting to Path instead of str should also work and be more aligned with the existing code, but really anything is a stopgap until resolve #1298.

Copy link
Copy Markdown
Member

@deepyaman deepyaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated suggestions, with my updated understanding.

P.S. Can we also add a test that covers this?


def load(self) -> ir.Table:
load_path = self._get_load_path()
load_path = str(self._get_load_path())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
load_path = str(self._get_load_path())
load_path = Path(self._get_load_path())

Path(save_path).parent.mkdir(parents=True, exist_ok=True)
writer = getattr(self.connection, f"to_{self._file_format}")
writer(data, save_path, **self._save_args)
writer(data, str(save_path), **self._save_args)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
writer(data, str(save_path), **self._save_args)
writer(data, save_path, **self._save_args)

Comment on lines 162 to 163
save_path = self._get_save_path()
Path(save_path).parent.mkdir(parents=True, exist_ok=True)
Copy link
Copy Markdown
Member

@deepyaman deepyaman Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
save_path = self._get_save_path()
Path(save_path).parent.mkdir(parents=True, exist_ok=True)
save_path = Path(self._get_save_path())
save_path.parent.mkdir(parents=True, exist_ok=True)

For a temporary workaround, maybe the set of proposed suggestions is also fine?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants