Add support for rosidl::Buffer-aware per-endpoint pub/sub by nvcyc · Pull Request #930 · ros2/rmw_zenoh

nvcyc · 2026-03-17T04:12:52Z

Description

This pull request adds full rosidl::Buffer support to rmw_zenoh_cpp, enabling per-endpoint Zenoh publishers and subscribers for zero-copy buffer transport between compatible backends. When a publisher and subscriber share compatible non-CPU buffer backends, data can be transferred via a lightweight descriptor; when backends are incompatible, the system falls back to standard CPU-based buffer serialization.

This pull request consists of the following key changes:

Backend lifecycle: Calls rosidl_buffer_backend_registry::initialize_buffer_backends() / shutdown_buffer_backends() during RMW init/shutdown to load and tear down buffer backend plugins.
Liveliness key-expression extension: Extended key-expressions to advertise each endpoint's supported backends, enabling dynamic discovery.
Graph cache discovery callbacks: Buffer-aware publishers and subscribers register discovery callbacks to detect each other and dynamically create per-endpoint Zenoh publishers/subscribers.
Buffer-aware publishers: Create per-subscriber Zenoh endpoints; endpoint info is passed to the typesupport serialization layer, which delegates compatibility to each backend's create_descriptor_with_endpoint() (nullptr: CPU fallback). Publisher creation explicitly adds "cpu" to backend_aux_info.
Fallback publish: publish() first sends endpoint-aware messages via publish_buffer_aware(), then conditionally falls through to the standard base-key publish path only when the total matched subscription count exceeds discovered buffer-aware subscribers -- avoiding unnecessary CPU conversion when all subscribers are buffer-aware.
Buffer-aware subscribers: Create per-publisher Zenoh subscriptions; the Message struct owns endpoint info via std::optional<EndpointInfoStorage> Endpoint info is passed into deserialization for correct backend reconstruction.
acceptable_buffer_backends: Parses the subscription option -- NULL/empty/"cpu": CPU-only (advertises "cpu" in liveliness token); "any": all installed; specific names: filtered. In on_publisher_discovered(), CPU is always added to the publisher's backend list.

Is this user-facing behavior change?

This pull request does not change existing rmw_zenoh_cpp behavior for standard (non-Buffer) messages. For messages with uint8[] fields, the per-endpoint transport is transparent -- publishers and subscribers share backend info automatically, and CPU fallback ensures correctness when backends are incompatible.

Did you use Generative AI?

Yes. Claude (claude-4.6-opus) via Cursor was used to assist with creating an initial prototype version of the changes contained in this PR.

Additional Information

This PR is part of the broader ROS 2 native buffer feature introduced in this post.

hidmic

First pass. Partial pass. I need more brain to parse all of this.

ahcorde · 2026-04-01T15:56:23Z

@YuanYuYuan or @JEnoch do you mind to take a look ? mw freeze it's next monday (6th April)

wjwwood

Partial review

JEnoch · 2026-04-05T20:54:13Z

@YuanYuYuan or @JEnoch do you mind to take a look ? mw freeze it's next monday (6th April)

Apologies for not having been able to attend the working group sessions or catch up on the discussions over the past few weeks. Unfortunately I still don't have bandwidth to do a thorough review of this PR at this point. I suspect the same is true for my colleague @YuanYuYuan.

I just have one suggestion: it would be great if docs/design.md could be updated as part of this PR to document the new buffer-aware pub/sub architecture and the changes to all key expression formats (liveliness tokens and topics).

If the other reviewers are satisfied, I'm happy to defer to their judgment and approve the merge.

YuanYuYuan · 2026-04-06T11:30:32Z

@YuanYuYuan or @JEnoch do you mind to take a look ? mw freeze it's next monday (6th April)

Just returned from a long holiday 😄 I went through the deadlock issue and verified it last week. A patch has been proposed at #955

asymingt · 2026-04-16T23:40:38Z

Just a quick note that we are in a RMW freeze right now for the ROS 2 Lyrical release. Please reach out to @sloretz before you merge this one.

nvcyc · 2026-04-17T00:41:02Z

Thanks for the reminder.
Yes, I'm aware of the current RMW freeze state for the ROS 2 Lyrical release.

Based on the discussion in the ROS 2 Lyrical release working group meeting on 2026/4/6, we'll target this PR for Lyrical patch 1 release to match the same level of rosidl::Buffer support rmw_fastrtps_cpp currently has, so I'm continuing the development here mainly for that goal.

I'll confirm in the working group with @sloretz before merging this PR when it's ready.

Signed-off-by: CY Chen <cyc@nvidia.com>

…backs update_topic_map_for_put() collected discovery callbacks under discovery_mutex_ but still invoked them while graph_mutex_ was held (via the lock_guard in parse_put). Any callback that re-enters graph_cache — e.g. creating a per-endpoint subscription which calls register_publisher_discovery_callback() — would attempt to re-acquire graph_mutex_ on the same thread, deadlocking immediately. Fix: change update_topic_map_for_put() and update_topic_maps_for_put() to return the collected callbacks instead of invoking them. parse_put() switches from lock_guard to unique_lock so it can call lock.unlock() before iterating over the returned callbacks. This is a defensive complement to the lock-order fix in e91c15a. While no current callback directly re-acquires graph_mutex_, invoking external callbacks under an internal mutex is an API contract violation that creates fragility for future changes. Signed-off-by: YuanYu Yuan <yuanyu.yuan@zettascale.tech>

Signed-off-by: YuanYu Yuan <yuanyu.yuan@zettascale.tech>

Signed-off-by: CY Chen <cyc@nvidia.com>

nvcyc · 2026-05-12T18:28:25Z

Pulls: #930
Gist: https://gist.githubusercontent.com/nvcyc/4facd927bf9cb229ad82de132920f12e/raw/360295ca2557d4167aea0499f2665a75e408a3c8/ros2.repos
BUILD args: --continue-on-error --packages-above-and-dependencies rmw_zenoh_cpp
TEST args: --packages-above rmw_zenoh_cpp
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/19244

Linux
Linux-aarch64
Linux-rhel
Windows

Signed-off-by: CY Chen <cyc@nvidia.com>

nvcyc · 2026-05-13T03:46:48Z

Pulls: #930
Gist: https://gist.githubusercontent.com/nvcyc/58f6bd3266a2b466d2e3c9a025572c89/raw/360295ca2557d4167aea0499f2665a75e408a3c8/ros2.repos
BUILD args: --continue-on-error --packages-above-and-dependencies rmw_zenoh_cpp
TEST args: --packages-above rmw_zenoh_cpp
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/19254

Linux
Linux-aarch64
Linux-rhel
Windows

nvcyc · 2026-05-13T18:04:40Z

Pulls: #930
Gist: https://gist.githubusercontent.com/nvcyc/779e2927eb344675e062b7588954a13f/raw/360295ca2557d4167aea0499f2665a75e408a3c8/ros2.repos
BUILD args: --continue-on-error --packages-above-and-dependencies rmw_zenoh_cpp
TEST args: --packages-above rmw_zenoh_cpp
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/19258

Linux
Linux-aarch64
Linux-rhel
Windows

Signed-off-by: CY Chen <cyc@nvidia.com>

nvcyc · 2026-05-14T04:59:39Z

Pulls: #930
Gist: https://gist.githubusercontent.com/nvcyc/03c2ecfd49160673f25e2f95486d7e80/raw/360295ca2557d4167aea0499f2665a75e408a3c8/ros2.repos
BUILD args: --continue-on-error --packages-above-and-dependencies rmw_zenoh_cpp
TEST args: --packages-above rmw_zenoh_cpp
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/19262

Linux
Linux-aarch64
Linux-rhel
Windows

nvcyc requested review from MiguelCompany, ahcorde, hidmic, karsten-nvidia, mjcarroll, skyegalaxy and tfoote March 17, 2026 04:14

nvcyc marked this pull request as draft March 17, 2026 04:15

nvcyc marked this pull request as ready for review March 17, 2026 04:16

mjcarroll reviewed Mar 18, 2026

View reviewed changes

Comment thread rmw_zenoh_cpp/src/detail/graph_cache.cpp Outdated

nvcyc mentioned this pull request Mar 19, 2026

Add rosidl_buffer_backend_registry ros2/rosidl#944

Merged

hidmic reviewed Mar 30, 2026

View reviewed changes

nvcyc marked this pull request as draft March 31, 2026 17:33

ahcorde requested review from JEnoch and YuanYuYuan April 1, 2026 15:55

ahcorde mentioned this pull request Apr 1, 2026

Native support for tensor datatypes ros2/ros2#1736

Open

nvcyc force-pushed the native_buffer branch from de7cfab to c23c1d2 Compare April 1, 2026 23:18

nvcyc marked this pull request as ready for review April 1, 2026 23:52

wjwwood reviewed Apr 2, 2026

View reviewed changes

Comment thread rmw_zenoh_cpp/src/detail/buffer_backend_loader.cpp Outdated

Comment thread rmw_zenoh_cpp/src/detail/buffer_backend_loader.cpp Outdated

asymingt assigned YuanYuYuan Apr 16, 2026

nvcyc force-pushed the native_buffer branch from 940e030 to 8707772 Compare April 17, 2026 05:23

YuanYuYuan mentioned this pull request Apr 23, 2026

fix(liveliness): escape '/' in backend names embedded in key expressions #970

Merged

nvcyc added 3 commits May 12, 2026 16:51

Add support for rosidl::Buffer-aware per-endpoint pub/sub

3936d1c

Signed-off-by: CY Chen <cyc@nvidia.com>

Add buffer backend init/shutdown functions

df56572

Signed-off-by: CY Chen <cyc@nvidia.com>

Fix lint error

dd160a2

Signed-off-by: CY Chen <cyc@nvidia.com>

nvcyc and others added 10 commits May 12, 2026 16:51

Rename backend aux info to backend metadata

35419c1

Signed-off-by: CY Chen <cyc@nvidia.com>

Update to use per-context buffer backend registry support

a52bf54

Signed-off-by: CY Chen <cyc@nvidia.com>

Add support for legacy subscribers in the rosidl::buffer path

c814dc6

Signed-off-by: CY Chen <cyc@nvidia.com>

Add CPU group endpoints

fd28bda

Signed-off-by: CY Chen <cyc@nvidia.com>

Use per-topic CPU channels

bdfb053

Signed-off-by: CY Chen <cyc@nvidia.com>

style: fix uncrustify line-length in is_cpu_only_backend_metadata

ed95f80

Signed-off-by: YuanYu Yuan <yuanyu.yuan@zettascale.tech>

Use single shared accelerated channel per buffer-aware subscriber

9eb4c28

Signed-off-by: CY Chen <cyc@nvidia.com>

Address review comments

7aa4390

Signed-off-by: CY Chen <cyc@nvidia.com>

fix(liveliness): escape '/' in backend names embedded in key expressions

94a0aaa

nvcyc force-pushed the native_buffer branch from eebc3ae to 94a0aaa Compare May 12, 2026 17:25

Update buffer size estimation to align with rmw_fastrtps_cpp

3eef50f

Signed-off-by: CY Chen <cyc@nvidia.com>

Change logs for rosidl::Buffer to DEBUG level

c7e0261

Signed-off-by: CY Chen <cyc@nvidia.com>

nvcyc force-pushed the native_buffer branch from cfa5006 to c7e0261 Compare May 14, 2026 04:59

Conversation

nvcyc commented Mar 17, 2026

Description

Is this user-facing behavior change?

Did you use Generative AI?

Additional Information

Uh oh!

Uh oh!

hidmic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ahcorde commented Apr 1, 2026

Uh oh!

wjwwood left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

JEnoch commented Apr 5, 2026

Uh oh!

YuanYuYuan commented Apr 6, 2026

Uh oh!

asymingt commented Apr 16, 2026

Uh oh!

nvcyc commented Apr 17, 2026

Uh oh!

nvcyc commented May 12, 2026

Uh oh!

nvcyc commented May 13, 2026

Uh oh!

nvcyc commented May 13, 2026

Uh oh!

nvcyc commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants