Skip to content

[Data] ray.data.from_tf() fails on TensorFlow datasets with ragged tensors#62704

Open
weimingdiit wants to merge 1 commit into
ray-project:masterfrom
weimingdiit:data/from-tf-ragged-tensor-support
Open

[Data] ray.data.from_tf() fails on TensorFlow datasets with ragged tensors#62704
weimingdiit wants to merge 1 commit into
ray-project:masterfrom
weimingdiit:data/from-tf-ragged-tensor-support

Conversation

@weimingdiit
Copy link
Copy Markdown
Contributor

Description

ray.data.from_tf() currently relies on dataset.as_numpy_iterator(), which raises when the TensorFlow dataset contains tf.RaggedTensor values.

This PR fixes that by iterating over the TensorFlow dataset directly and recursively converting TensorFlow values into Ray-compatible Python / NumPy values.

The change includes:

  • support for tf.RaggedTensor via Ray's existing ragged tensor utilities
  • existing handling for tf.SparseTensor and dense tf.Tensor
  • a regression test for ray.data.from_tf() with ragged tensor inputs

Related issues

Closes #62703

Additional information

This change does not modify the public API of ray.data.from_tf(). It only fixes the conversion path so TensorFlow datasets with ragged tensors can be ingested correctly.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates ray.data.from_tf to handle tf.RaggedTensor by replacing the as_numpy_iterator call with a manual conversion loop. While this enables support for ragged tensors, the feedback identifies a performance regression for standard datasets and a potential memory issue caused by converting tf.SparseTensor to dense arrays. It is suggested to maintain the optimized iterator path for non-ragged datasets and preserve sparse representations where possible.

Comment thread python/ray/data/read_api.py Outdated
Comment thread python/ray/data/read_api.py
@weimingdiit weimingdiit force-pushed the data/from-tf-ragged-tensor-support branch from 98972b9 to 81c6ab6 Compare April 17, 2026 06:15
…nsors

Replace the as_numpy_iterator-based conversion path in from_tf() with
direct TensorFlow dataset iteration so ragged tensor inputs are handled
correctly. Add a regression test covering tf.RaggedTensor datasets.

Signed-off-by: weimingdiit <[email protected]>
@weimingdiit weimingdiit force-pushed the data/from-tf-ragged-tensor-support branch from 81c6ab6 to 82e6c28 Compare April 17, 2026 08:31
@weimingdiit weimingdiit marked this pull request as ready for review April 19, 2026 12:28
@weimingdiit weimingdiit requested a review from a team as a code owner April 19, 2026 12:28
@ray-gardener ray-gardener Bot added data Ray Data-related issues community-contribution Contributed by the community labels Apr 19, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions Bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label May 4, 2026
@richardliaw
Copy link
Copy Markdown
Contributor

hey @weimingdiit - sorry for the late reply. Taking a quick look now.

@richardliaw richardliaw added go add ONLY when ready to merge, run all tests and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Data] ray.data.from_tf() fails on TensorFlow datasets with ragged tensors

2 participants