Add support for rosidl::Buffer-aware per-endpoint pub/sub#930
Conversation
hidmic
left a comment
There was a problem hiding this comment.
First pass. Partial pass. I need more brain to parse all of this.
|
@YuanYuYuan or @JEnoch do you mind to take a look ? mw freeze it's next monday (6th April) |
Apologies for not having been able to attend the working group sessions or catch up on the discussions over the past few weeks. Unfortunately I still don't have bandwidth to do a thorough review of this PR at this point. I suspect the same is true for my colleague @YuanYuYuan. I just have one suggestion: it would be great if If the other reviewers are satisfied, I'm happy to defer to their judgment and approve the merge. |
Just returned from a long holiday 😄 I went through the deadlock issue and verified it last week. A patch has been proposed at #955 |
|
Just a quick note that we are in a RMW freeze right now for the ROS 2 Lyrical release. Please reach out to @sloretz before you merge this one. |
|
Thanks for the reminder. Based on the discussion in the ROS 2 Lyrical release working group meeting on 2026/4/6, we'll target this PR for Lyrical patch 1 release to match the same level of I'll confirm in the working group with @sloretz before merging this PR when it's ready. |
Signed-off-by: CY Chen <cyc@nvidia.com>
Signed-off-by: CY Chen <cyc@nvidia.com>
Signed-off-by: CY Chen <cyc@nvidia.com>
Signed-off-by: CY Chen <cyc@nvidia.com>
Signed-off-by: CY Chen <cyc@nvidia.com>
Signed-off-by: CY Chen <cyc@nvidia.com>
Signed-off-by: CY Chen <cyc@nvidia.com>
Signed-off-by: CY Chen <cyc@nvidia.com>
…backs update_topic_map_for_put() collected discovery callbacks under discovery_mutex_ but still invoked them while graph_mutex_ was held (via the lock_guard in parse_put). Any callback that re-enters graph_cache — e.g. creating a per-endpoint subscription which calls register_publisher_discovery_callback() — would attempt to re-acquire graph_mutex_ on the same thread, deadlocking immediately. Fix: change update_topic_map_for_put() and update_topic_maps_for_put() to return the collected callbacks instead of invoking them. parse_put() switches from lock_guard to unique_lock so it can call lock.unlock() before iterating over the returned callbacks. This is a defensive complement to the lock-order fix in e91c15a. While no current callback directly re-acquires graph_mutex_, invoking external callbacks under an internal mutex is an API contract violation that creates fragility for future changes. Signed-off-by: YuanYu Yuan <yuanyu.yuan@zettascale.tech>
Signed-off-by: YuanYu Yuan <yuanyu.yuan@zettascale.tech>
Signed-off-by: CY Chen <cyc@nvidia.com>
Signed-off-by: CY Chen <cyc@nvidia.com>
|
Pulls: #930 |
Signed-off-by: CY Chen <cyc@nvidia.com>
|
Pulls: #930 |
|
Pulls: #930 |
Signed-off-by: CY Chen <cyc@nvidia.com>
|
Pulls: #930 |
Description
This pull request adds full
rosidl::Buffersupport tormw_zenoh_cpp, enabling per-endpoint Zenoh publishers and subscribers for zero-copy buffer transport between compatible backends. When a publisher and subscriber share compatible non-CPU buffer backends, data can be transferred via a lightweight descriptor; when backends are incompatible, the system falls back to standard CPU-based buffer serialization.This pull request consists of the following key changes:
rosidl_buffer_backend_registry::initialize_buffer_backends()/shutdown_buffer_backends()during RMW init/shutdown to load and tear down buffer backend plugins.create_descriptor_with_endpoint()(nullptr: CPU fallback). Publisher creation explicitly adds"cpu"tobackend_aux_info.publish()first sends endpoint-aware messages viapublish_buffer_aware(), then conditionally falls through to the standard base-key publish path only when the total matched subscription count exceeds discovered buffer-aware subscribers -- avoiding unnecessary CPU conversion when all subscribers are buffer-aware.Messagestruct owns endpoint info viastd::optional<EndpointInfoStorage>Endpoint info is passed into deserialization for correct backend reconstruction.acceptable_buffer_backends: Parses the subscription option --NULL/empty/"cpu": CPU-only (advertises"cpu"in liveliness token);"any": all installed; specific names: filtered. Inon_publisher_discovered(), CPU is always added to the publisher's backend list.Is this user-facing behavior change?
This pull request does not change existing
rmw_zenoh_cppbehavior for standard (non-Buffer) messages. For messages withuint8[]fields, the per-endpoint transport is transparent -- publishers and subscribers share backend info automatically, and CPU fallback ensures correctness when backends are incompatible.Did you use Generative AI?
Yes. Claude (claude-4.6-opus) via Cursor was used to assist with creating an initial prototype version of the changes contained in this PR.
Additional Information
This PR is part of the broader ROS 2 native buffer feature introduced in this post.