Skip to content

To master#300

Merged
sgerbino merged 99 commits into
masterfrom
develop
Jun 2, 2026
Merged

To master#300
sgerbino merged 99 commits into
masterfrom
develop

Conversation

@sgerbino
Copy link
Copy Markdown
Collaborator

@sgerbino sgerbino commented Jun 2, 2026

No description provided.

sgerbino and others added 30 commits February 23, 2026 15:52
MSVC stores the coroutine_handle<> return value from await_suspend on
the coroutine frame via hidden __$ReturnUdt$. After await_suspend
publishes the coroutine handle to another thread (e.g. via IOCP), that
thread can resume/destroy the frame before __resume reads the handle
back for the symmetric transfer tail-call, causing a use-after-free.

On MSVC, call the returned handle's resume() on the machine stack
instead of returning it for symmetric transfer. For IOCP awaitables
that return noop_coroutine(), this is a no-op.
Use generator expressions for include directories so paths don't
leak to consumers. Gate benchmarks and tests behind options that
default to OFF for subdirectory builds. Add EXPORT_NAME so the
imported target is Boost::capy across all consumption patterns.
Provide boost_install() for superproject builds and a standalone
install path with config/version files for find_package() support.
Remove CMakePresets.json and ASCII section dividers.
boost_install's __boost_install_update_include_directory expects
either a raw path or $<BUILD_INTERFACE:path> to replace with the
correct $<INSTALL_INTERFACE:${CMAKE_INSTALL_INCLUDEDIR}>. When
INSTALL_INTERFACE was already set, the pattern match failed and
the versioned include path (include/boost-X_Y) was never applied.
Make boost_capy_test_suite and boost_capy_test_suite_main available
to downstream consumers via find_package(boost_capy). Add generator
expressions for include dirs, EXPORT_NAME properties, Boost:: aliases,
and install rules for targets, headers, and CMake modules.
Join worker threads before destroying execution_context services
so in-flight strand operations (e.g. try_unlock after dispatching
the last coroutine) finish before strand_impls are freed.
Replace pair<size_t, deduplicated_variant> return type with a plain
std::variant that preserves one alternative per input task. This makes
variant::index() the single mechanism for identifying the winner and
eliminates the type-deduplication machinery (unique_variant_t and
friends). Internally, variant construction uses in_place_index<I>
instead of in_place_type<T> to support same-type alternatives.

The homogeneous range overloads (vector) are unchanged.
Clang-CL defines both __clang__ and _MSC_VER, but uses the
MSVC-style _CPPRTTI macro (not __GXX_RTTI) to signal RTTI.
The previous check matched __clang__ first, incorrectly set
BOOST_CAPY_NO_RTTI=1, and caused the address-based type_id
fallback to be used. With shared linking, DLL and EXE get
separate copies of the static tag, so find_service() fails
across module boundaries ("service not installed").

Fix: check _MSC_VER before __clang__ so Clang-CL uses _CPPRTTI.
Add four new examples (async-mutex, custom-executor, parallel-tasks,
strand-serialization) with CMake build files and Antora documentation
pages. Rewrite echo-server-corosio to use the real Corosio API.

Fix echo_session to take tcp_socket by value to prevent a dangling
reference from the detached coroutine. Add catch(...) to on_error
handlers to ensure latch signaling for non-std exceptions.
thread_pool::impl::stop() set the stop_ flag without holding the
mutex, creating a window where notify_all() could fire before a
worker thread entered cv_.wait(), causing a permanent hang. This
was the root cause of async_event test timeouts on slower FreeBSD
CI runners.

Fix by acquiring the mutex before setting stop_, per the C++
standard requirement that shared variables used in condition_variable
predicates must be modified under the mutex.

Also demote stop_ from std::atomic<bool> to plain bool since all
accesses are now mutex-protected.
Add detail::symmetric_transfer() to centralize the MSVC workaround
where await_suspend returning coroutine_handle<> stores the value on
the coroutine frame via __$ReturnUdt$. Apply it to when_all_runner,
when_any_runner, and dispatch_trampoline final_suspend sites that
were vulnerable but not previously covered.
Cover code paths the existing testMembers/testGrind harness cannot
reach: two-segment buffer splits when data wraps around the ring,
clamping behavior for oversized commit/consume, the in_pos_ reset
optimization in consume(), capacity-1 boundary, and a 2000-iteration
fuzz test against a reference model.
Type traits used in when_any/when_all return types were in detail::,
forcing users to reach into the detail namespace to name them.
Move these traits to the public namespace with descriptive names
so signatures are self-documenting:

- awaitable_result_t<A>: the return type of co_await on an awaitable
- void_to_monostate_t<T>: maps void to std::monostate for variant storage
- non_void_tuple_t<Ts...>: a tuple with void types filtered out

when_any now returns std::variant<void_to_monostate_t<...>...> directly
in its signature instead of hiding behind an alias. when_all uses
non_void_tuple_t in its signature. Both are immediately readable
without chasing type aliases.

Also simplify when_any_state to use a flat parameter pack (Ts...)
instead of requiring the first type separately (T0, Ts...).
Static asserts verify all buffer sequence types satisfy the expected
std::ranges hierarchy (range through contiguous_range where applicable).
View adaptors (take, drop, reverse) are verified to produce valid
ConstBufferSequence/MutableBufferSequence types, with a note that
filter_view is not const-iterable and thus incompatible with the
current buffer APIs. Runtime tests exercise buffer_size and buffer_copy
through composed view pipelines.
Both types hold intrusive lists of suspended waiters whose nodes
contain raw pointers back into the list. Moving would invalidate
those pointers. The move operations were already implicitly deleted
via the Rule of Five; this makes the intent explicit and adds
javadoc for the deleted special members and Thread Safety sections.
Replace `using namespace boost::capy;` with `namespace capy = boost::capy;`
and qualify all capy names with `capy::` across all 17 example source
files and 14 corresponding documentation pages. Doc code listings are
verified to match compiled source files exactly.
The pseudocode mirrors the C++ standard's own description. Add a note
explaining this and link to the awaiter protocol section.
Lambda captures live in the closure object, not the coroutine frame,
causing dangling references when the lambda temporary is destroyed.
Document the IIFE parameter pattern and named function alternatives.
)

The run/run_async API already supports stop token injection, but this
was undertested and underdocumented, making it appear missing.

Add tests verifying that run(ex) inherits the caller's stop token,
run(ex, st) overrides it, and run_async propagates tokens to deferred
tasks. Restructure the launching docs into a unified stop token
propagation section and fix the download_manager example to use
io_env propagation.
sgerbino and others added 26 commits April 14, 2026 18:58
Captures the design space and trade-offs for buffer representation,
customization, dynamic buffers, and caller/callee ownership patterns
in capy's I/O subsystem.
Pins the required semantics of capy::run at a cross-executor
boundary: the forward trip must enqueue the target task, and
the return trip must enqueue the caller's continuation.

Covers five scenarios:
- run(pe)(inner) from a handler on pe must not let inner cut
  the queue ahead of other pending work.
- When the target runs synchronously, the return trip must
  still tick the caller's executor, so higher-priority work
  there runs before the caller resumes.
- run(inner)(work) from inside a strand must release the
  strand while work runs.
- A handler that does co_await run(strand)(task) must be
  outside the strand after the await returns.
- An io loop that does co_await run(compute_pool)(task) must
  resume on the io thread, not on a compute worker.

All five fail against the current dispatch-based run() and
will pass once run() posts on both trips. Adds a test-only
priority_executor support header used by the first three.
run_awaitable_ex::await_suspend and the return trampoline's
final_suspend both called dispatch(). On executors with a
thread-check fast path (strand, blocking_executor) dispatch can
fire an inline symmetric transfer, which does not enqueue the
target and does not give the caller's executor a fresh tick on
the return. run(ex)(task) then fails to actually run task on ex
and leaves the caller resuming on the wrong frame.

Switch both trips to post + std::noop_coroutine(). Also rename
dispatch_trampoline to boundary_trampoline; the type's purpose
is bridging the executor boundary, and the old name named a
mechanism that no longer applies.

The five previously-failing tests in boost.capy.run.priority,
boost.capy.ex.run, and boost.capy.strand now pass; full suite
green at 76743 assertions.
enqueue was calling impl_->cv.notify_one() outside the lock
scope. A foreign thread could still be inside notify_one() when
the main thread drained the queue, completed the task, saw
signal_done, and destroyed the context. TSan flagged the read
against cv during cond_signal after the waiter had released it.

Move notify_one inside the lock_guard scope, matching signal_done.

The race was latent before run() was switched to post on the
return trip: no foreign thread ever called blocking_executor::post,
so enqueue only ran on the pumping thread. testHopsBackToIoThread
exercises a compute-pool worker posting back to blocking_executor
via the return trampoline, which is what revealed the race under
TSan.
Codecov's default behavior is to fail CI on any decrease from the
base commit's coverage. Small refactors can move the denominator
enough to trip this (recently a 0.12% drop) without the change
meaningfully reducing test quality.

Mark project and patch status as informational so coverage is
reported on PRs but never gates merge. Matches corosio's
configuration.
dispatch() previously always posted, forcing a queue round-trip even
when the caller was already running on one of the pool's workers. It
now returns c.h when the calling thread is a worker of the pool, per
the Executor concept's inline-resume contract. A thread_local_ptr
marker set via RAII in run() identifies pool workers; cross-pool and
non-worker callers still post.
Replace the shared-impl strand pool with a per-strand implementation
backed by a shared pool of mutexes. Removes the bucket-collision
class where independent strands sharing a slot serialized against
each other and compared equal.

- strand_impl is per-strand, allocated by the service
- Service holds a 193-mutex pool, hashed by impl address mixed with a
  per-service salt; collisions share a mutex, never pending work
- Coroutine invoker keeps the impl alive via its frame parameter;
  invoker frames recycle through a single-slot per-service cache,
  closed under a kCacheClosed sentinel during shutdown
- Service tracks live impls via intrusive_list for shutdown traversal
- Service back-pointer in strand_impl is atomic so the destructor's
  load pairs with shutdown's store

Adds doc/strand-spec.md (design contract) and doc/strand-rationale.md
(why the redesign was needed).

Tests: equality non-collision regression, cross-strand independence,
transient strand lifetime via weak_ptr expiry, many-strands stress,
deterministic mutex-pool collision isolation.
New top-level section 7 (Testing Facilities) documents every public
header under include/boost/capy/test/. The section mirrors section 6
(Stream Concepts) one-for-one so a reader who learned a concept can
find its mock counterpart by predictable navigation.

Pages (under doc/modules/ROOT/pages/7.testing/):

* 7.intro              - section overview, header-to-page map
* 7a.drivers           - run_blocking, fuse, thread_name
* 7b.mock-streams      - read_stream, write_stream, stream
* 7c.mock-sources-sinks - read_source, write_sink
* 7d.mock-buffer-concepts - buffer_source, buffer_sink
* 7e.buffer-inspection - bufgrind, buffer_to_string

Each page follows section 6's shape: brief lede, one == per type with
code example and === subsections for nuance, an inline API table per
type showing the constructor as a single declaration with defaults,
a closing snippet that combines the page's types, a == Reference table,
and a closing pointer to the next page. The fuse class is a pimpl
backed by std::shared_ptr<state>, so by-value constructor arguments
share fail-point machinery; this is documented in 7a and cross-
referenced from each mock that takes fuse by value.

Existing Examples (was 7) renumbered to 8 and Design (was 8) renumbered
to 9. All cross-references updated. nav.adoc inserts the new section
between Streams (6) and Examples (8).
Eight xrefs in `index.adoc` and `quick-start.adoc` used the old flat
path style (e.g. `coroutines/tasks.adoc`) instead of the numbered prefix
the rest of the site uses (`4.coroutines/4a.tasks.adoc`). Antora was
emitting "target of xref not found" errors on every build. Updated all
eight to point at the actual page paths.

The ninth error was in `9k.Executor.adoc` (formerly `8k.Executor.adoc`),
which linked to `../continuation-rationale.adoc` — that file is a
markdown design note (`doc/continuation-rationale.md`), not an Antora
page, so the xref could never resolve. Removed the broken link; the
surrounding prose still describes what `continuation` is.

Antora build now reports zero errors.
Four pages cited a `cat()` function that was never part of the public
buffer API. The Antora build did not flag these because the references
appeared inside `[source,cpp]` blocks (not xrefs), but readers following
the examples would hit a compilation error.

Fixes:

* `5a.overview.adoc` — removed the `Zero-Allocation Composition`
  section; the conceptual point ("any composition of the above without
  allocation") is preserved in the bullet list above and in
  `5c.sequences.adoc`.
* `5c.sequences.adoc` — removed the `Zero-Allocation Composition`
  section; `Heterogeneous Composition` already covers the same idea.
* `5e.algorithms.adoc` — rewrote the Zero-Copy I/O example to use
  `std::array` directly, mirroring the Scatter/Gather example on the
  same page.
* `8c.buffer-composition.adoc` — removed the exercise that referenced
  `cat()`; the other two exercises remain.
Audit of all 35 code snippets in 7.testing pages against actual headers
revealed structural bugs that were verified by compiling and running
each example. Fixes:

* 7a thread_pool snippet: replace nonexistent <boost/capy/thread_pool.hpp>
  and pool.post(lambda) (which takes only `continuation&`) with the
  idiomatic run_async(pool.get_executor())(coroutine) pattern.

* 7b/7c/7d "Putting It Together" snippets: add missing concept-header
  includes (read_stream, read_source, write_sink, buffer_source,
  buffer_sink) so the templated examples actually compile.

* 7b/7c/7d "Putting It Together" snippets: fix a structural bug where
  mocks were constructed and asserted on outside the f.armed() lambda.
  With armed running the body multiple times for fault injection, the
  outside state was the last run's state, not the success run's. Move
  mock construction and state assertions inside the lambda following
  the canonical capy test pattern, with functions-under-test now
  returning std::error_code so callers can guard with
  `if(ec) co_return;`.

* 7c handle_request: read_source::read returns cond::eof on partial
  fill; treat eof-with-data as success rather than bailing.

* 7d buffer_sink basic example: remove the dead `if(bufs.empty())
  co_return;` check. prepare() always returns a 1-element span for a
  non-empty input array.

* All "Returns ...eof" prose and table descriptions: standardize on
  cond::eof spelling per error.hpp:32 ("Compare with cond::eof"), since
  cond::eof is the user-side comparison value while error::eof is the
  implementation's returned code. Both compare equal.

* Mock constructor signatures in tables: add `explicit` qualifier to
  match the actual headers (read_stream, write_stream, read_source,
  write_sink, buffer_source, buffer_sink, fuse(error_code)).

* Document write_some rollback semantics: write_stream::write_some and
  write_sink::write_some roll back on expected-data mismatch and return
  (test_failure, 0); call out the asymmetry with write_sink::write,
  which leaves the partial write in place.

* Expand 7a's "armed() vs. inert()" subsection with a smoke-test-first
  pattern showing the same test body under both modes, plus prose
  explaining when each fits.

* 7e bufgrind: add prose explaining why snippets use f.inert (bufgrind
  does not do I/O or consult a fuse, so a single pass is sufficient).

All snippets verified by compiling and running them through the test
suite during development.
task::handle(), quitter::handle(), and the matching release() methods
are public, but calling coroutine_handle::destroy() on a suspended
task or quitter that another coroutine is awaiting produces undefined
behavior: the io_awaitable_promise_base destructor cascades through
the parent's continuation while the parent's destruction is already
in progress.

Document the constraint where users will encounter it (Doxygen on
handle() and release(), and a new "Why Not coroutine_handle::destroy()?"
subsection in the cancellation chapter), and clarify the destructor's
contract in io_awaitable_promise_base. No behavior change.
The free-function `read` and `write` overloads previously bound the
buffer sequence by const reference. With lazy coroutines this dangles:
the returned `io_task` can be stored past the full-expression that
created the sequence, by which point the const reference points at a
destroyed temporary. Switch both to by-value so the sequence lives in
the coroutine frame.

Update the ReadStream/WriteStream concept docstrings to list only the
by-value signature as conforming and document the `std::views::all`
caller-side workaround for expensive owning sequences. Append a
Resolution section to doc/buffers-passing-rationale.md recording the
decision and why both `const&` and `&&` fail under lazy coroutines.

Add regression tests that store the returned awaitable past a
temporary buffer sequence; verified to trip ASan stack-use-after-scope
with the previous signatures and to pass cleanly with by-value.

Resolves #263.
Introduces a public free function `buffer_slice(seq, offset, length)`
that returns an object of unspecified type satisfying the new `Slice`
concept (`data()`, `remove_prefix()`). When `seq` models
`MutableBufferSequence`, the returned object additionally satisfies
`MutableSlice` — the slice-side analog of the
ConstBufferSequence / MutableBufferSequence refinement.

The implementation type `detail::slice_impl<BS>` tracks iteration
state and exposes its current bytes through `data()`, which returns
a buffer sequence view whose mutability follows from the input.

Replaces `consuming_buffers`, which conflated iteration state with
the buffer-sequence role. The new design splits them: the slice is
not itself a buffer sequence, eliminating the dual role and making
the iteration-vs-slicing distinction explicit. The offset/length
parameters generalize the abstraction beyond incremental consumption
to arbitrary byte sub-ranges.

`read()`, `write()`, and `write_now` switched to the new algorithm.
Removes the tag_invoke machinery for buffer-sequence operations and
the custom types whose only role was to participate in it. Buffer
sequences are now plain ranges of `const_buffer` or `mutable_buffer`
(or single buffer values).

Removed:
- `size_tag`, `slice_tag`, `slice_how` tag types and their
  `tag_invoke` friends on `mutable_buffer`, `const_buffer`, and
  `buffer_array`.
- The generic `tag_invoke(size_tag, ConstBufferSequence)` overload.
- `slice_of<>` and the slice CPOs (`keep_prefix`, `remove_prefix`,
  `keep_suffix`, `remove_suffix`, `prefix`, `suffix`, `sans_prefix`,
  `sans_suffix`) from `buffers/slice.hpp`. The file is deleted.
- `const_buffer_pair` / `mutable_buffer_pair` aliases and the free
  `tag_invoke(slice_tag, ...)` on them. The aliases were just
  `std::array<X, 2>`; users now use `std::array` directly.

Demoted:
- `buffer_array<N, IsConst>` moves from `boost::capy::` to
  `boost::capy::detail::`. The class itself stays — it's the
  scatter/gather adapter used internally by `any_*` facilities, and
  is preserved modulo the namespace move and the removed tag_invoke
  friends. Convenience aliases `detail::const_buffer_array<N>` /
  `detail::mutable_buffer_array<N>` are preserved for brevity.

Reimplemented:
- `buffer_size` is now a plain free function template that iterates
  the range and sums sizes. No more CPO dispatch. Wrapped with a
  GCC-only `-Wmaybe-uninitialized` pragma (matching existing
  precedent in `write_now.hpp`) to suppress a GCC 13 false positive
  in iteration over `detail::buffer_array`'s union storage.

`Slice` lifetime contract (refines #261):
- The `Slice` concept documents the lifetime contract explicitly: a
  Slice is associated, on construction, with an underlying buffer
  sequence. The slice — and any buffer sequence returned by its
  `data()` — is valid only while that underlying sequence remains
  valid. The buffer sequence returned by `data()` is independent
  of the slice object: subsequent operations on the slice (mutation,
  copy, move, destruction) do not invalidate an already-obtained
  view.
- `buffer_slice` gains a deleted `BufferSequence const&&` overload
  to reject rvalue arguments at compile time. Passing a temporary
  buffer sequence would produce an immediately dangling slice; the
  deleted overload surfaces this as a compile error instead of
  runtime UB.

Updated callers:
- `any_*` facilities (`any_read_source`, `any_read_stream`,
  `any_write_stream`, `any_write_sink`) use `detail::buffer_array<N>`.
- `circular_dynamic_buffer`'s `const_buffers_type` /
  `mutable_buffers_type` are `std::array<X, 2>` directly.
- Test infrastructure (`bufgrind`, `test_buffers.hpp`) rewritten on
  top of `buffer_slice` / `Slice`.
- Antora pages (`why-capy.adoc`, `9m.WhyNotCobalt.adoc`,
  `8c.buffer-composition.adoc`) updated to drop references to
  removed types; the buffer-composition example demonstrates
  `std::array<const_buffer, 2>` directly.
The strand pending queue previously wrapped each posted handle in a
heap-allocated `strand_op` coroutine solely for its `next` pointer.
The user-facing `continuation` already carries an intrusive `next`,
so link continuations directly through the queue and delete the
wrapper machinery (`strand_op`, `frame_prefix`, `make_strand_op`,
`free_list_`, the prefix-allocator new/delete).

The detail-layer `strand_service::dispatch`/`post` signatures change
from `coroutine_handle<>` to `continuation&`. Public `strand::post`/
`dispatch` signatures are unchanged.

Tightens the implementation to the documented contract: a continuation
must outlive its time in any executor queue. Tests that posted
stack-local continuations and let them die before dispatch are updated
to hoist storage out to a vector.
Drop the empty "World" column from all four tables and the empty "std"
column from the streams, any_*, and buffers tables. Replace blank cells
with "(none)" so missing entries are unambiguous instead of looking like
unfinished research.

Fix two verified inaccuracies:
- Slice/MutableSlice are concepts (capitalized); replace the lowercase
  `slice` row and add the missing `MutableSlice` row.
- Remove the std::static_thread_pool claim; P2300R10 (merged into C++26)
  omits any concrete thread pool, leaving only std::execution::run_loop.
* docs: prototype new specification for `write`

This changes the declaration format and JavaDoc annotations for
funciton capy::write. The goal is to show how an alternative
specification would look like and render in the documentation.

* Use term "contingency"

* Claude review

* namespace prefix
Parallel benchmark under bench/stdexec/ that exercises the same I/O
read-stream workload as bench/beman/ but uses NVIDIA stdexec instead
of beman::execution. Gated by a new CMake option
BOOST_CAPY_BUILD_STDEXEC_EXAMPLES, independent from the existing
BOOST_CAPY_BUILD_P2300_EXAMPLES.

Three tables x four rows x two columns:
  Table 1: sender/receiver pipeline (exec::repeat_until)
  Table 2: capy::task (capy::thread_pool)
  Table 3: exec::task (exec::static_thread_pool)
  Rows:    Native / Abstract / Type-erased / Synchronous

Idiomatic stdexec throughout: exec::static_thread_pool,
exec::task<void>, exec::any_sender<any_receiver<completion_signatures<>>>
(post PR #2040), exec::repeat_until, and write_env(prop(
exec::get_frame_allocator, alloc)) at type-erased pipeline roots so
op-state storage routes through the counting/recycling resource.

Bridges between stdexec and capy executor model:
  - capy::as_sender(IoAwaitable)    -> stdexec sender
  - capy::await_sender(stdexec snd) -> IoAwaitable
  - sender_as_capy_executor + pool_scheduler + static_pool_context
    adapt exec::static_thread_pool to capy's Executor concept

Table 1 cells use exec::static_thread_pool(2) to work around
exec::repeat_until's single-worker deadlock (it synchronously
emplaces iteration N+1 inside iteration N's set_value cascade, so
a single worker can't drain the queue it just posted to). Tables 2
and 3 stay at pool(1) because co_await suspension releases the
worker between iterations. The header comment in main.cpp documents
this trade-off.

Two stdexec gaps documented in the source comments:
  - exec::any_sender value-storage on construction can't route
    through get_frame_allocator (env not visible at construction time)
  - exec::task has no allocator hook for coroutine frames
Demonstrates that capy::await_sender and capy::as_sender compose with
nvexec::stream_scheduler, not just CPU schedulers. Scene 1: a capy
coroutine co_awaits a SAXPY __global__ kernel scheduled on a CUDA
stream, with continues_on(cpu) landing completion on host before the
bridge connects. Scene 2: capy's read_some is exposed as a stdexec
sender, driven by sync_wait, with upon_error catching an injected eof.

Gated behind BOOST_CAPY_BUILD_NVEXEC_EXAMPLES (default OFF) which
hard-errors if BOOST_CAPY_BUILD_STDEXEC_EXAMPLES is off or
CMAKE_CXX_STANDARD < 23, then enables the CUDA language at the top
level. Bridge headers are copied verbatim from bench/stdexec/ so
example tweaks can land without disturbing the bench.

The README documents the working toolchain (clang as both host and
CUDA compiler with CUDA_SEPARABLE_COMPILATION OFF). nvc++ 26.3 does
not enable C++20 coroutines, so the nominally blessed nvexec compiler
cannot compile capy.
Compile-time validation that the code listings from P4251R0 "IoAwaitables
for GPU Data Movement" are type-correct against the real boost::capy API,
split by dependency so the non-GPU parts need no CUDA toolchain.

example/cuda/datamovement (CUDA, compile-only): the hand-rolled awaiter,
cuda_stream (memcpy/synchronize as IoAwaitables, plus optional NCCL
interop), cuda_device_stream as a WriteStream with link-time transport
polymorphism, and a coroutine-driven CUDA Graph replay. Gated by
BOOST_CAPY_BUILD_CUDA_EXAMPLES.

example/cuda/pipeline (CUDA + nvexec): the await_sender / as_sender
bridges composing capy coroutines with nvexec senders, including a
compile-only inference-handler that the paper's verbatim listing cannot
express on nvexec (host call under a device-side then). Gated by
BOOST_CAPY_BUILD_NVEXEC_EXAMPLES.

example/fabrics (no CUDA): the transport-neutral listings, a byte-oriented
compound-result coroutine and the HPC-fabric send signatures
(ibv_post_send, fi_send, ucp_tag_send_nbx), each compiled against the real
library when found.

CMake enables the CUDA language once for either CUDA example set. The
code-to-paper-section mapping is kept out of the tree (local FINDINGS.md).
@cppalliance-bot
Copy link
Copy Markdown

An automated preview of the documentation is available at https://300.capy.prtest3.cppalliance.org/index.html

If more commits are pushed to the pull request, the docs will rebuild at the same URL.

2026-06-02 17:57:02 UTC

@cppalliance-bot
Copy link
Copy Markdown

GCOVR code coverage report https://300.capy.prtest3.cppalliance.org/gcovr/index.html
LCOV code coverage report https://300.capy.prtest3.cppalliance.org/genhtml/index.html
Coverage Diff Report https://300.capy.prtest3.cppalliance.org/diff-report/index.html

Build time: 2026-06-02 18:07:07 UTC

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.27%. Comparing base (eb1de34) to head (98be9fd).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #300      +/-   ##
==========================================
+ Coverage   91.68%   92.27%   +0.58%     
==========================================
  Files          76      164      +88     
  Lines        4484     8862    +4378     
==========================================
+ Hits         4111     8177    +4066     
- Misses        373      685     +312     
Flag Coverage Δ
linux 92.26% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
include/boost/capy/buffers.hpp 96.92% <ø> (-3.08%) ⬇️
include/boost/capy/buffers/buffer_param.hpp 100.00% <ø> (ø)
include/boost/capy/buffers/buffer_slice.hpp 100.00% <ø> (ø)
...ude/boost/capy/buffers/circular_dynamic_buffer.hpp 100.00% <ø> (ø)
include/boost/capy/buffers/flat_dynamic_buffer.hpp 100.00% <ø> (ø)
include/boost/capy/buffers/make_buffer.hpp 100.00% <ø> (ø)
...clude/boost/capy/buffers/string_dynamic_buffer.hpp 100.00% <ø> (ø)
...clude/boost/capy/buffers/vector_dynamic_buffer.hpp 98.03% <ø> (ø)
include/boost/capy/cond.hpp 100.00% <ø> (ø)
include/boost/capy/delay.hpp 100.00% <ø> (ø)
... and 56 more

... and 86 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eb1de34...98be9fd. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sgerbino sgerbino merged commit 35c25f3 into master Jun 2, 2026
76 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants