Skip to content

feat(system): add data auto-sync API for data/ directory#301

Merged
boy-hack merged 3 commits intomainfrom
feat/data-auto-sync-20260410
Apr 14, 2026
Merged

feat(system): add data auto-sync API for data/ directory#301
boy-hack merged 3 commits intomainfrom
feat/data-auto-sync-20260410

Conversation

@boy-hack
Copy link
Copy Markdown
Collaborator

Summary

Adds a new system API group that lets operators pull the latest rule files from GitHub without restarting the server or rebuilding the Docker image.

New Endpoints

Method Path Description
POST /api/v1/system/update-data Trigger async sync from GitHub
GET /api/v1/system/update-status Poll current/last sync status

How It Works

  1. Download — fetches the repository zip archive from codeload.github.com (branch or tag)
  2. Extract — strips the top-level prefix, writes only the requested data/ sub-directories to disk
  3. No restart needed — rule files are read at scan time; running scans are unaffected

Request Options (POST /system/update-data)

{
  "ref": "main",          // branch or tag, default: main
  "is_tag": false,        // true when ref is a tag
  "github_token": "...", // optional, avoids 60 req/h anon rate limit
  "dirs": "fingerprints,vuln,vuln_en,mcp,eval,agents"  // optional
}

Files Changed

  • common/websocket/update_api.go — handler + async sync logic (zip download → extract → overwrite)
  • common/websocket/update_api_test.go — unit tests covering selective extraction, full extraction, invalid zip, dir splitting
  • common/websocket/server.go — route registration (/api/v1/system group, identity middleware)
  • docs/api_data_update.md — full API documentation with curl examples, field reference, error table, workflow diagram

Test Results

=== RUN   TestExtractDataDirs_selectiveDirs
--- PASS (0.00s)
=== RUN   TestExtractDataDirs_allDirs
--- PASS (0.00s)
=== RUN   TestExtractDataDirs_invalidZip
--- PASS (0.00s)
=== RUN   TestSplitDirs
--- PASS (0.00s)

Notes

  • Sync is idempotent: re-running overwrites existing files, does not delete files missing upstream
  • github_token is never logged or persisted
  • Protected by existing setupIdentityMiddleware() — not publicly accessible

zhuque and others added 3 commits April 10, 2026 17:25
Add two new endpoints under /api/v1/system/:

  POST /api/v1/system/update-data
    Downloads the GitHub archive (branch or tag) and overwrites the
    configured data/ sub-directories (fingerprints, vuln, vuln_en,
    mcp, eval, agents). Runs asynchronously; supports optional
    github_token to avoid rate-limiting and per-call dir selection.

  GET /api/v1/system/update-status
    Returns current / last sync status: running flag, success bool,
    started_at / finished_at timestamps, file count, message.

Implementation:
- common/websocket/update_api.go  — handler + sync logic
- common/websocket/update_api_test.go — unit tests (zip extract, dir filter)
- common/websocket/server.go      — route registration under /system group
- docs/api_data_update.md         — full API documentation with examples
@boy-hack boy-hack merged commit 3ab31e6 into main Apr 14, 2026
4 checks passed
@boy-hack boy-hack deleted the feat/data-auto-sync-20260410 branch April 14, 2026 02:50
boy-hack pushed a commit that referenced this pull request Apr 17, 2026
…oken

- Replace GitHub archive download (zip) with  to
  avoid the need for a GitHub token. Anonymous git clone works without
  any rate-limit concerns for a single operation.
- Remove the  field from UpdateDataRequest entirely.
- Add copyDataDirs / copyDir helpers that recursively copy the requested
  data/ sub-directories from the cloned temp dir to the working dir.
- Remove the zip-based extractDataDirs helper (no longer needed).
- Rewrite update_api_test.go to test copyDataDirs instead of extractDataDirs.
  All existing test scenarios are preserved (selective dirs, all dirs,
  missing sub-dir, splitDirs).
- Rewrite docs/api_data_update.md to match the api_zh.md doc style
  (endpoint info table, request/response tables, curl + python examples).

Closes #301
@boy-hack
Copy link
Copy Markdown
Collaborator Author

Reworked in PR #327:

  • Replaced zip download with git clone --depth 1 — no GitHub token needed
  • Removed github_token field from the request entirely
  • Rewrote docs/api_data_update.md to match standard doc format (parameter tables, cURL + Python examples)

#327

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant