Skip to content

Require explicit memory management from the user and remove sloppy memory management.#436

Merged
shakedregev merged 81 commits into
developfrom
andrew/strict-mem-management
Jun 10, 2026
Merged

Require explicit memory management from the user and remove sloppy memory management.#436
shakedregev merged 81 commits into
developfrom
andrew/strict-mem-management

Conversation

@andrewxu319

Copy link
Copy Markdown
Collaborator

Description

Remove automatic memory allocation and memory syncing in various memory management functions and require the user to explicitly manage their own memory.
Resolves #295.

Proposed changes

Major changes:

  • Assert that data is updated in Vector::getData() and remove automatic synchronization
  • Assert that data is allocated in copyFromExternal() and remove automatic allocation for vectors and matrices
  • Assert that data is not already allocated in allocate() and allocateMatrixData() for vectors and matrices and remove auto deallocation
  • Add isUpdated(), isAllocated(), allocateAll() to Vector
  • Add allocateAll() to Sparse
  • Update all parts of the code that call these functions and are affected by these changes such that they work correctly

Minor changes:

  • Update .gitignore to ignore test outputs
  • Remove HyKKTSolver::readMatrixFiles() which is unnecessary

Checklist

  • All tests pass (make test and make test_install per testing instructions). Code tested on
    • CPU backend
    • CUDA backend
    • HIP backend
  • I have manually run the non-experimental examples and verified that residuals are close to machine precision. (In your build directory run: ./examples/<your_example>.exe -h to get instructions how to run examples). Code tested on:
    • CPU backend
    • CUDA backend
    • HIP backend
  • Code compiles cleanly with flags -Wall -Wpedantic -Wconversion -Wextra.
  • The new code follows Re::Solve style guidelines.
  • There are unit tests for the new code.
  • The new code is documented.
  • The feature branch is rebased with respect to the target branch.
  • I have updated CHANGELOG.md to reflect the changes in this PR. If this is a minor PR that is part of a larger fix already included in the file, state so.

Further comments

I also tried to add status returns to a few functions in the VectorHandler and LinAlgWorkspace series of classes, but that crashes the HIP implementation for seemingly unknown reasons.

andrewxu319 and others added 30 commits June 4, 2026 16:59
…ale tests to fail. Made "createIncrementingVector()" return a pointer to the new vector instead of a copy (which causes the data to be freed prematurely).
…arse"

This reverts commit 00f6992.

Kept changes to "io.cpp", which fixes the an issue with out-of-bounds iterators.
…HyKKT implementation (if matvec works correctly).
…ed right now, but good to prevent stack overflows.
… doesn't work because MatrixHandler is broken (specifically matvec).
* use caller-provided backend-specific handlers in SCCG

* load SCCG data on host before device sync

* refresh CUDA SpMV cache when matrix changes

* fix HyKKT SCCG HIP target setup

* document SCCG backend setup and CUDA SpMV cache handling

* mirror SpMV cache reset for HIP

* Shaked/random error fix (#424)

Fixed stochastic failures due to a faulty random number generator and asynchronous operations.

Co-authored-by: shakedregev <shakedregev@users.noreply.github.com>

---------

Co-authored-by: tamar-dewilde <tamar-dewilde@users.noreply.github.com>
Co-authored-by: Shaked Regev <35384901+shakedregev@users.noreply.github.com>
Co-authored-by: shakedregev <shakedregev@users.noreply.github.com>
Comment on lines 138 to 139
for (real_type val = 0.0; val <= 1.0; val += 1.0)
{ // Use a step to prevent infinite loop

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really part of this PR, but it's not clear to me why we have a loop at all here. It's simply the first thing happening, then the second.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest commit

@shakedregev shakedregev left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good work! Tests pass on all platforms. See:

  1. Some minor documentation comments.
  2. Some very weird code that was not part of this PR. Should be fixed now or later.

@superwhiskers superwhiskers left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor comments.

i ran the cpu-only tests on my local machine, all cpu tests passed. one warning was not addressed:

ReSolve/examples/sysRefactor.cpp:33:43: warning: unused parameter ‘hw_backend’ [-Wunused-parameter]
   33 | static void syncDevice(const std::string& hw_backend)
      |                        ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~

other than that, it seems fine. this warning may be an artifact of a cpu-only build. the gpu build tests are taking a little longer, so i'll leave this review without them.

Comment thread resolve/matrix/Coo.cpp
Comment thread resolve/vector/Vector.cpp Outdated
Comment thread resolve/workspace/LinAlgWorkspaceCUDA.cpp Outdated
@andrewxu319

Copy link
Copy Markdown
Collaborator Author

some minor comments.

i ran the cpu-only tests on my local machine, all cpu tests passed. one warning was not addressed:

ReSolve/examples/sysRefactor.cpp:33:43: warning: unused parameter ‘hw_backend’ [-Wunused-parameter]
   33 | static void syncDevice(const std::string& hw_backend)
      |                        ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~

other than that, it seems fine. this warning may be an artifact of a cpu-only build. the gpu build tests are taking a little longer, so i'll leave this review without them.

The full function is

static void syncDevice(const std::string& hw_backend)
{
#ifdef RESOLVE_USE_CUDA
  if (hw_backend == "CUDA")
  {
    cudaDeviceSynchronize();
  }
#endif

#ifdef RESOLVE_USE_HIP
  if (hw_backend == "HIP")
  {
    hipDeviceSynchronize();
  }
#endif
}

So the variable is used if either CUDA or HIP is enabled.

@shakedregev Would it be a good idea to add a [[maybe_unused]]? Or just leave it be?

@shakedregev

Copy link
Copy Markdown
Collaborator

some minor comments.
i ran the cpu-only tests on my local machine, all cpu tests passed. one warning was not addressed:

ReSolve/examples/sysRefactor.cpp:33:43: warning: unused parameter ‘hw_backend’ [-Wunused-parameter]
   33 | static void syncDevice(const std::string& hw_backend)
      |                        ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~

other than that, it seems fine. this warning may be an artifact of a cpu-only build. the gpu build tests are taking a little longer, so i'll leave this review without them.

The full function is

static void syncDevice(const std::string& hw_backend)
{
#ifdef RESOLVE_USE_CUDA
  if (hw_backend == "CUDA")
  {
    cudaDeviceSynchronize();
  }
#endif

#ifdef RESOLVE_USE_HIP
  if (hw_backend == "HIP")
  {
    hipDeviceSynchronize();
  }
#endif
}

So the variable is used if either CUDA or HIP is enabled.

@shakedregev Would it be a good idea to add a [[maybe_unused]]? Or just leave it be?

It makes more sense to me to call the function only if you need it. HIP/CUDA hw_backend, call this. CPU don't. Anything else should probably throw an error.

@superwhiskers

superwhiskers commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

managed to complete the rocm-enabled tests on my local machine after instituting a workaround for a rocm bug related to my specific amd gpu. builds and runs fine. no leaks outside those present within rocm, runs fine.

there is one concerning thing, and that is the flakiness of two tests on my machine:

  • sys_refactor_hip_asym_test
  • sys_refactor_hip_test

primarily the former. this most likely is not related to this pull request, though. just noting this for future reference. it's probably a tolerance thing, unlikely to be anything significant.

Comment thread examples/sysRefactor.cpp Outdated
Comment thread examples/sysRefactor.cpp Outdated
Comment thread resolve/workspace/LinAlgWorkspaceCUDA.cpp

@superwhiskers superwhiskers left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still getting this warning on a cpu-only build.

ReSolve/examples/sysRefactor.cpp:33:43: warning: unused parameter ‘hw_backend’ [-Wunused-parameter]
   33 | static void syncDevice(const std::string& hw_backend)
      |                        ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~

everything else looks fine.

@superwhiskers superwhiskers dismissed their stale review June 10, 2026 13:48

superseded

@shakedregev shakedregev left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tests pass and concerns addressed.

@shakedregev shakedregev merged commit a93fb65 into develop Jun 10, 2026
6 checks passed
@andrewxu319 andrewxu319 deleted the andrew/strict-mem-management branch June 10, 2026 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda enhancement New feature or request hip

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Strictly enforce memory requirements.

5 participants