Feature: async node#1620
Conversation
6c05578 to
752fcb6
Compare
c8e8d12 to
ab678ff
Compare
|
This looks quite interesting! I haven't acquainted myself with async patterns too much yet, so I won't be of much value as a reviewer of the whole PR. On thing has, however, caught my eye: always subscribing to clock. This is something I wouldn't recommend. I sometimes deliberately don't set sim time on a node if I know it doesn't need it (e.g. a plain relay that just gets a message, computes something and outputs another message with the input timestamp). The reason why I do it is performance: if you have a 1000 Hz sim clock and 50 nodes... You should use ROS 1 :D So I definitely don't want to lose the ability to not subscribe clock. Another thing: is it possible to specify service call timeouts with the async client? |
The existing SingleThreadedExecutor would probably fry your cpu if you published at 1000hz haha.
Yeah you could.
async with AsyncNode("node") as node:
qos_profile = QoSProfile(lifespan=Duration(seconds=2))
client = node.create_client(SetBool, "/set_bool", qos_profile=qos_profile)
with asyncio.timeout(2):
response = await client.call(SetBool.Request(data=True))That's a limitation of the current client rcl/rmw api. |
360e7c2 to
56047ef
Compare
|
As we have discussed before, this is a very high quality contribution. It brings our interactions in the Python client library in line with what many Python developers would expect. I agree with the refactoring into Base classes. We wouldn't want to inherit from what is already there because it could cause confusion. This is a good move. Otherwise the API ergonomics look great, and putting it in the experimental namespace should give us a little bit of latitude on getting improvements in over time. With that, I would suggest we do this in ~6 PRs. You mentioned 12 on zulip, but I feel like that would be a lot to manage. I think (just guessing here):
I think those are logical chunks that should each be pretty easy to review. |
nadavelkabets
left a comment
There was a problem hiding this comment.
Reminder - test AsyncNode with python 3.14
Signed-off-by: Nadav Elkabets <elnadav12@gmail.com>
a6fb93e to
5be1fa4
Compare
…d before running Signed-off-by: Nadav Elkabets <elnadav12@gmail.com>
97b0281 to
54a8b71
Compare
Signed-off-by: Nadav Elkabets <elnadav12@gmail.com>
…ior in CI Signed-off-by: Nadav Elkabets <elnadav12@gmail.com>
Signed-off-by: Nadav Elkabets <elnadav12@gmail.com>
|
Pulls: #1620 |
|
CI for #1620 with:
|
|
I retriggered the RHEL job with RHEL 10 instead of RHEL 9. |
Signed-off-by: Nadav Elkabets <elnadav12@gmail.com>
|
@nadavelkabets once you're done, let me know and I can trigger another full CI run. And I can keep an eye on it and hopefully merge it later once it's green (or green enough) |
Signed-off-by: Nadav Elkabets <elnadav12@gmail.com>
|
CI for #1620 for
|
|
Looks like some issues:
|
|
Yes, I forgot to add the ros2/ci branch |
|
Windows passed with a single test failure before. |
|
And ci_linux passed before the last minor changes |
And it now failed with the same single test failure. Nadav says this might be a Windows clock accuracy thing. I think this should be fixable in a follow-up PR |
|
CI - just linux and rhel, and just testing rclpy since those test are the only ones that failed, and they appear to have failed for a CI issue that should be fixed by ros2/ci#871 . I assume the above windows results will still be valid and don't need a re-run |
|
@sloretz are you able to keep an eye on CI and merge if it looks good (enough)? It’s 1:30 am here 😴 😅 |
Do we need to restart Linux and rhel? |
|
I updated the comment |
|
aarch64 looks good, and now RHEL also looks good (no complaints from pytest). ci_linux (amd64) is stuck in the queue, but I think aarch64 should be fairly representative of it, so I’m calling it good and merging |
Summary
AsyncNodebrings nativeasynciosupport for rclpy, enablingasync/awaitthroughout subscription callbacks, service handlers, client calls, timers, and clock sleeps.Usage examples
run()— simple reactive node:async with— composable, user-controlled lifetime:client.call()— async service call, no futures or spinning:clock.sleep()— sim-time-aware, cancels on node shutdown:aiohttp — compose with any asyncio library in a callback:
serial — bridge ROS topics to a serial port:
Design
Core mechanism
DDS pushes work onto the asyncio event loop instead of an executor polling a wait set:
One reader task per entity waits on an asyncio Event, takes data, and runs the callback. The DDS callback only sets the event. This gives natural backpressure — the entity won't take another message until the current callback yields.
Structured concurrency
The design follows structured concurrency (Trio, asyncio TaskGroup). Every task has a clear owner, lifetimes are bounded by
async withscopes, and no task outlives its parent.The node's outer TaskGroup owns entity reader tasks. Subscriptions and services can run callbacks concurrently via a nested inner TaskGroup. When an entity is destroyed, the inner group cancels all in-flight callbacks before the entity handle is cleaned up — no orphaned callbacks. This is required for service correctness (a callback needs the live service handle to send its response), and subscriptions use the same pattern for consistency. Clients and timers don't need inner groups — clients only route responses to futures, timers dispatch sequentially.
Resource cleanup is deterministic: when the
async withblock exits, all reader tasks have finished, all DDS callbacks have been cleared, and no orphaned coroutines remain.Two entry points
Both entry points share the same lifecycle:
async withis for composable use cases where the user controls the lifetime (bridges, tests, multi-protocol applications).run()is for simple reactive nodes.Class hierarchy
Entity-owned architecture
Each async entity class owns its full DDS bridge: event creation, DDS callback registration, the take loop, callback dispatch, and cleanup. The node is a thin coordinator — it creates C handles via the base class, wraps them in async entity classes, and hands them a reference to its TaskGroup so they can spawn their own reader tasks.
BaseNode holds shared logic (parameters, clock, logger, name resolution, graph discovery) and calls factory methods polymorphically during init, so AsyncNode and Node each produce their own entity types without the base class knowing which subclass it's in.
Entities self-remove from the node's tracking set when destroyed, via a callback passed at construction. This keeps the node's bookkeeping consistent regardless of whether destruction is triggered by the node, by user code, or by the entity's own cleanup.
Shutdown
Shutdown is synchronous and uses task cancellation as the only mechanism. Destroying the node destroys all entities (cancelling their tasks and DDS handles), cancels pending clock sleeps, then marks the node handle for deferred destruction. Each entity's destroy is idempotent.
Clock sleep and timers
Clock sleep is the async replacement for blocking sleep. For wall time it schedules a delayed callback on the event loop, for sim time it registers a jump callback that resolves when simulated time advances past the target. If the time source changes during a sleep (ROS time activated or deactivated), the sleep raises an error since the target is no longer meaningful. All pending sleeps are cancelled on node shutdown.
Timers use the same dual-mode wait pattern. They support cancel (parks the loop until reset) and reset (wakes the parked loop). Timer callbacks are always dispatched sequentially — if a callback runs longer than the period, the next tick is delayed.
Known limitations
Actions and waitables are not yet supported. Waitable support requires a
set_on_ready_callbackAPI on the waitable interface, matching the approach used by rclcpp’s EventsExecutor, which is not yet available in rclpy.Performance
CPU usage
I ran the

test_rclpy_performance.pybenchmark from the EventsExecutor PR adapted for the AsyncNode.AsyncNode nearly matches EventsExecutor performance while significantly outperforming SingleThreadedExecutor.
Running the test with
uvloop.runinstead ofasyncio.runachieved an even lower cpu usage of 10%.Timer latency and jitter
At 50 Hz (20 ms period), AsyncNode's mean jitter is slightly higher than the existing executors, at ~0.16 ms above SingleThreadedExecutor and ~0.38 ms above EventsExecutor.

I didn't invest much in trying to optimize this, honestly the difference is really small.
Related work