[ENH] /init: create source collections + attach foundation function#7134
Open
LLay wants to merge 3 commits into
Open
[ENH] /init: create source collections + attach foundation function#7134LLay wants to merge 3 commits into
LLay wants to merge 3 commits into
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
This comment has been minimized.
This comment has been minimized.
davedash
reviewed
May 26, 2026
| "http_generate".to_string() | ||
| } | ||
| fn default_function_endpoint_url() -> String { | ||
| "https://chroma-core--foundation-research-generate-api.modal.run".to_string() |
Contributor
There was a problem hiding this comment.
Is this static across environments including tilt?
Contributor
Author
There was a problem hiding this comment.
Oh good question. @HammadB can you advise?
Collaborator
There was a problem hiding this comment.
it will change eventually, this should not be in the code IMO and should default to an error
2 tasks
HammadB
approved these changes
May 27, 2026
Collaborator
|
Please address - #7134 (comment) |
Foundation's /init endpoint now also ensures the source collections
(slack, notion — configurable via CHROMA_FOUNDATION__SOURCE_COLLECTIONS)
and sets the `chroma:group_chunk_siblings` metadata flag on each. That
flag opts the collection into chunk-sibling grouping in the worker's
PartitionOperator, so a job's chunk records stay in one partition and
the trailing end-of-job marker on `{base}-0` is observed after every
sibling chunk (ADR 0001 §6 in chroma-core/foundation).
- Promote the flag key to a shared constant
`chroma_types::CHROMA_GROUP_CHUNK_SIBLINGS_KEY` (next to the existing
CHROMA_* metadata keys) so the reader (worker) and writer
(foundation-api) share one definition. partition_log.rs now re-exports
it.
- foundation-api: new `source_collections` config (default
["slack","notion"]); `ensure_collection` takes optional metadata;
/init creates source collections with the flag and returns their ids.
Wiki collections are the function's *output* and intentionally do NOT
get the flag.
Stacked on the partition-operator change (chroma #7133), which reads
this flag. DRAFT — see PR body for the get-or-create idempotency
caveat.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the CLI POC (chroma-core/foundation #97): after ensuring each source collection, /init attaches the server-side function via SysDb::create_attached_function, with the wiki collection as output. - Attachment name `{source}_to_wiki`; operator `http_generate` (configurable); params carry the modal `endpoint_url`, `source_collection`, and `source_kind`. - New FoundationConfig fields: function_name, function_endpoint_url, min_records_for_invocation (defaults mirror the POC + the chroma frontend's 100-record default). Output dimension is already hardcoded to 1024 in /init (chroma #7127), so no seed_output_collection step is needed. - Idempotent: AlreadyExists / CollectionAlreadyHasFunction are treated as success so /init stays safe to call repeatedly.
a540171 to
a78a010
Compare
Drop default_function_endpoint_url and its hardcoded modal.run default. function_endpoint_url is now Option<String> defaulting to None, and /init errors (MissingFunctionEndpointUrl -> 500) when the attached function needs it but the deploy left it unset — so a misconfigured deployment fails loudly instead of silently pointing at a baked-in POC endpoint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Extends the foundation
/initendpoint to mirror the CLI POC (chroma-core/foundation #97), so/initis the single bootstrap for a team's foundation workspace. On top of the existing wiki + wiki_revisions creation,/initnow:CHROMA_FOUNDATION__SOURCE_COLLECTIONS).chroma:group_chunk_siblings = trueon each source collection so the worker'sPartitionOperator([ENH] Group chunk siblings into one compaction partition #7133) keeps a job's chunk records in one partition — the ordering the end-of-job marker relies on (ADR 0001 §6).SysDb::create_attached_function, with the wiki collection as output — the server-side equivalent of the POC's HTTP attach.Function attach (mirrors POC #97)
{source}_to_wiki; operatorhttp_generate(configurable).params:{ endpoint_url, source_collection, source_kind }—endpoint_urldefaults to the modal URL from the POC;source_collection/source_kindare the source name.min_records_for_invocationdefaults to 100 (matches the chroma frontend default).seed_output_collectionstep — per @HammadB, the output dimension is already hardcoded to 1024 in/init's collection creation (chroma [CHORE] Make foundation api /init use correct schema, index, dim #7127, already on main).AlreadyExists/CollectionAlreadyHasFunctionare treated as success, so/initstays safe to call repeatedly.Shared constant
The chunk-sibling flag key is promoted to
chroma_types::CHROMA_GROUP_CHUNK_SIBLINGS_KEYso the reader (workerPartitionOperator) and writer (/init) share one definition;partition_log.rsre-exports it.Wiki collections deliberately untouched
Wiki/wiki_revisions are the function's output — no chunk-sibling flag, no attach. The marker mechanism operates on the source/input side.
Caveat: get-or-create idempotency
/inituses get-or-create for collections. If a source collection already exists without the flag (e.g. created by an earlier upload), the metadata isn't retroactively updated./initmust run before the first upload (it's the bootstrap). The function attach is independently idempotent. Pre-existing source collections would need a one-off metadata backfill — out of scope here.Test plan
cargo check -p foundation-api,cargo check -p chroma-typespass.cargo test -p foundation-api --lib routes::init— 4/4 pass.partition_log.rsre-export couldn't be compiled locally (Homebrew rustc 1.94.1 vs pinned 1.92.0;wal3fails under 1.94.1 independent of this change)./init→ source collections carry the flag + havehttp_generateattached → uploads chunk into them → attached function runs and observes the end-of-job marker last.🤖 Generated with Claude Code