[BUG] harden single-node deployment against memory-safety and DoS attacks by rescrv · Pull Request #7098 · chroma-core/chroma

rescrv · 2026-05-20T18:16:00Z

Description of changes

Address five chained vulnerabilities in the single-node (sqlite + local HNSW)
deployment that together enable unauthenticated heap corruption with a
credible path to remote code execution.

Dimension TOCTOU (CWE-367 → CWE-787/CWE-125):

Add a per-collection mutex around dimension initialization in
validate_embedding so concurrent /add requests cannot race to set
conflicting dimensions.
Validate persisted HNSW header dimensionality on cold-load against the
sysdb dimension; reject mismatches before constructing the index.
Stage id_map mutations in a pending copy during apply_log_chunk and
commit only on success, preventing partial state on error.
Validate embedding dimensionality per-record in the HNSW writer.

Unbounded HNSW configuration (CWE-20/CWE-1284 → CWE-770):

Add upper bounds to all HNSW tunables: max_neighbors ≤ 128,
ef_construction ≤ 4096, ef_search ≤ 4096, resize_factor ≤ 10.0,
sync_threshold ≤ 4096.
Enforce bounds at every ingestion point: schema validation,
InternalCollectionConfiguration, UpdateCollectionConfiguration,
and the compaction pipeline.
Skip HNSW backfill for collections with invalid persisted config
but still advance the metadata watermark so log purging proceeds.

Concurrent cache-miss race (CWE-362 → CWE-664):

Introduce a partitioned async mutex (AysncPartitionedMutex) in
LocalSegmentManager keyed on IndexUuid. Double-check the cache
after acquiring the lock to prevent two HnswIndex instances from
sharing the same on-disk segment directory.

delete_collection resource leak (CWE-401/CWE-772 → CWE-400):

Actively evict and close HNSW indexes on collection deletion.
Delete on-disk segment directories via tombstone-rename + periodic
background cleanup of orphaned HNSW index dirs (6-hour age gate).
Purge sqlite log rows for deleted collections.
Propagate segment deletion through delete_database as well.
Serialize database-level create/delete behind a per-database lock.

Log purge skipped on backfill error (CWE-754 → CWE-400):

Always dispatch PurgeLogsMessage after BackfillMessage regardless
of backfill outcome; the purge handler independently computes
watermarks from persisted segment state.
For collections with invalid HNSW config, fall back to the
metadata-segment watermark so logs do not accumulate unboundedly.

Where-clause stack overflow:

Cap $or/$and recursion depth at 64 in both parse_where and
parse_where_document to prevent attacker-triggered stack overflow
that could crash the tokio runtime.

Test plan

Local testing of new tests + CI for our full suite.

Migration plan

This code migrates itself.

Observability plan

N/A

Documentation Changes

Will need to update docs to document limitations and recovery for corrupted hnsw segments.

Co-authored-by: AI

…acks Address five chained vulnerabilities in the single-node (sqlite + local HNSW) deployment that together enable unauthenticated heap corruption with a credible path to remote code execution. Dimension TOCTOU (CWE-367 → CWE-787/CWE-125): - Add a per-collection mutex around dimension initialization in validate_embedding so concurrent /add requests cannot race to set conflicting dimensions. - Validate persisted HNSW header dimensionality on cold-load against the sysdb dimension; reject mismatches before constructing the index. - Stage id_map mutations in a pending copy during apply_log_chunk and commit only on success, preventing partial state on error. - Validate embedding dimensionality per-record in the HNSW writer. Unbounded HNSW configuration (CWE-20/CWE-1284 → CWE-770): - Add upper bounds to all HNSW tunables: max_neighbors ≤ 128, ef_construction ≤ 4096, ef_search ≤ 4096, resize_factor ≤ 10.0, sync_threshold ≤ 4096. - Enforce bounds at every ingestion point: schema validation, InternalCollectionConfiguration, UpdateCollectionConfiguration, and the compaction pipeline. - Skip HNSW backfill for collections with invalid persisted config but still advance the metadata watermark so log purging proceeds. Concurrent cache-miss race (CWE-362 → CWE-664): - Introduce a partitioned async mutex (AysncPartitionedMutex) in LocalSegmentManager keyed on IndexUuid. Double-check the cache after acquiring the lock to prevent two HnswIndex instances from sharing the same on-disk segment directory. delete_collection resource leak (CWE-401/CWE-772 → CWE-400): - Actively evict and close HNSW indexes on collection deletion. - Delete on-disk segment directories via tombstone-rename + periodic background cleanup of orphaned HNSW index dirs (6-hour age gate). - Purge sqlite log rows for deleted collections. - Propagate segment deletion through delete_database as well. - Serialize database-level create/delete behind a per-database lock. Log purge skipped on backfill error (CWE-754 → CWE-400): - Always dispatch PurgeLogsMessage after BackfillMessage regardless of backfill outcome; the purge handler independently computes watermarks from persisted segment state. - For collections with invalid HNSW config, fall back to the metadata-segment watermark so logs do not accumulate unboundedly. Where-clause stack overflow: - Cap $or/$and recursion depth at 64 in both parse_where and parse_where_document to prevent attacker-triggered stack overflow that could crash the tokio runtime. Co-authored-by: AI

github-actions · 2026-05-20T18:22:21Z

This comment has been minimized.

Sign in to view

add a tool to detect integrity violations

57fc636

This was referenced May 20, 2026

[BUG] hold read lock in get_vectors and harden HNSW parameter validation #7089

Closed

[BUG] prevent integer overflow in HNSW index resize #7093

Closed

use enforced limits in proptest

081a56e

This comment has been minimized.

Sign in to view

rescrv requested a review from HammadB May 22, 2026 14:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] harden single-node deployment against memory-safety and DoS attacks#7098

[BUG] harden single-node deployment against memory-safety and DoS attacks#7098
rescrv wants to merge 3 commits into
mainfrom
rescrv/bugfixes

rescrv commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rescrv commented May 20, 2026

Description of changes

Test plan

Migration plan

Observability plan

Documentation Changes

Uh oh!

github-actions Bot commented May 20, 2026

Reviewer Checklist

Testing, Bugs, Errors, Logs, Documentation

System Compatibility

Quality

Uh oh!

This comment has been minimized.

This comment has been minimized.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant