Skip to content

[BUG] hold read lock in get_vectors and harden HNSW parameter validation#7089

Closed
rescrv wants to merge 1 commit into
mainfrom
rescrv/bug1
Closed

[BUG] hold read lock in get_vectors and harden HNSW parameter validation#7089
rescrv wants to merge 1 commit into
mainfrom
rescrv/bug1

Conversation

@rescrv
Copy link
Copy Markdown
Contributor

@rescrv rescrv commented May 15, 2026

Description of changes

Address race conditions where get_vectors in LocalHnswSegment and
PersistentLocalHnswSegment accessed shared index state (_label_to_id,
_id_to_label, _index) without holding the read lock. Concurrent writes
could mutate these structures mid-read, leading to inconsistent results
or crashes.

Also harden HNSW parameter validation:

  • Bound resize_factor to [1.0, 5.0] and require finiteness, preventing
    pathological memory allocation via inf or huge floats
  • Require integer params (construction_ef, search_ef, M, num_threads)
    to be positive ints, rejecting booleans and non-positive values
  • Restrict pickle deserialization to an allowlist via SafeUnpickler

Test plan

Local of affected tests run. CI for rest.

Migration plan

N/A

Observability plan

N/A

Documentation Changes

N/A

Co-authored-by: AI

Address race conditions where get_vectors in LocalHnswSegment and
PersistentLocalHnswSegment accessed shared index state (_label_to_id,
_id_to_label, _index) without holding the read lock. Concurrent writes
could mutate these structures mid-read, leading to inconsistent results
or crashes.

Also harden HNSW parameter validation:
- Bound resize_factor to [1.0, 5.0] and require finiteness, preventing
  pathological memory allocation via inf or huge floats
- Require integer params (construction_ef, search_ef, M, num_threads)
  to be positive ints, rejecting booleans and non-positive values
- Restrict pickle deserialization to an allowlist via SafeUnpickler

Co-authored-by: AI
@github-actions
Copy link
Copy Markdown

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@rescrv rescrv requested a review from HammadB May 15, 2026 23:07
@rescrv
Copy link
Copy Markdown
Contributor Author

rescrv commented May 20, 2026

In favor of #7098

@rescrv rescrv closed this May 20, 2026
@rescrv rescrv deleted the rescrv/bug1 branch May 22, 2026 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant