-
Notifications
You must be signed in to change notification settings - Fork 29
feat(modelfile): support TF SavedModel, ONNX external_data, and online-learning trees #530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
4f303da
660ae9b
3aaa7e9
affc7c1
876d157
4c24aa3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -346,13 +346,45 @@ func (mf *modelfile) generateByWorkspace(config *configmodelfile.GenerateConfig) | |||||||||||
| return err | ||||||||||||
| } | ||||||||||||
|
|
||||||||||||
| // ONNX external_data post-processing: any tensor file referenced by an .onnx | ||||||||||||
| // file via external_data.location is unconditionally a model weight file, | ||||||||||||
| // regardless of its name or size. Walker may have classified small external | ||||||||||||
| // tensor files as code/config/doc by extension/size heuristic; reclassify them. | ||||||||||||
| mf.reclassifyONNXExternalData() | ||||||||||||
|
|
||||||||||||
| if mf.model.Size() == 0 && mf.code.Size() == 0 && mf.dataset.Size() == 0 { | ||||||||||||
| return fmt.Errorf("no model/code/dataset found - you have to create the Modelfile by yourself") | ||||||||||||
| } | ||||||||||||
|
|
||||||||||||
| return nil | ||||||||||||
| } | ||||||||||||
|
|
||||||||||||
| // reclassifyONNXExternalData scans every .onnx file already in mf.model, | ||||||||||||
| // extracts external_data.location paths, and moves those paths into mf.model | ||||||||||||
| // from whichever bucket the walker placed them in. | ||||||||||||
| func (mf *modelfile) reclassifyONNXExternalData() { | ||||||||||||
| for _, raw := range mf.model.Values() { | ||||||||||||
| modelRel, ok := raw.(string) | ||||||||||||
| if !ok || !strings.HasSuffix(strings.ToLower(modelRel), ".onnx") { | ||||||||||||
| continue | ||||||||||||
| } | ||||||||||||
| onnxAbs := filepath.Join(mf.workspace, modelRel) | ||||||||||||
| extPaths, err := ExtractONNXExternalDataPaths(onnxAbs) | ||||||||||||
| if err != nil { | ||||||||||||
| fmt.Fprintf(os.Stderr, "modelfile: parse ONNX external_data from %s failed: %v\n", modelRel, err) | ||||||||||||
| continue | ||||||||||||
| } | ||||||||||||
| onnxDir := filepath.Dir(modelRel) | ||||||||||||
| for _, ext := range extPaths { | ||||||||||||
| relExt := filepath.Join(onnxDir, ext) | ||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
Suggested change
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Addressed in 660ae9b, but with a stronger invariant than the prefix check. We now require the path to already be present in one of the walker's hashsets ( |
||||||||||||
| mf.code.Remove(relExt) | ||||||||||||
| mf.config.Remove(relExt) | ||||||||||||
| mf.doc.Remove(relExt) | ||||||||||||
| mf.model.Add(relExt) | ||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The ONNX post-pass unconditionally does Useful? React with 👍 / 👎.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Addressed in 660ae9b. The walker-membership check is exactly the proposed fix: a path can only be reclassified if it survived the walker's filter pass — which already enforces |
||||||||||||
| } | ||||||||||||
| } | ||||||||||||
| } | ||||||||||||
|
|
||||||||||||
| // generateByModelConfig generates the modelfile by the model config, such as config.json and generation_config.json. | ||||||||||||
| func (mf *modelfile) generateByModelConfig() error { | ||||||||||||
| // Get config map from json files. Collect all the keys and values from the config files | ||||||||||||
|
|
||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
external_data.locationis trusted verbatim and joined intorelExtwithout any containment check, so a crafted ONNX can inject../...(or absolute) paths into the generated MODEL entries. In this repo, build processing later accepts these paths as real files (seeprocessor.base.Process) and will read/upload them, which can pull data from outside the workspace. Please reject non-local locations (absolute or traversal) and only promote files that resolve under the workspace root.Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 660ae9b.
reclassifyONNXExternalDatanow only moves paths already collected by the workspace walker (walkerCollectedmembership check). Paths that resolve outside the workspace, get cleaned to absolute paths, or otherwise never entermf.{config,model,code,doc}are silently dropped — no../escapeor absolute path can leak into MODEL.