feat(agent): detect availability via session/new probe and assistant-first identity by kaizhou-lab · Pull Request #500 · iOfficeAI/AionCore

kaizhou-lab · 2026-06-22T09:27:42Z

Summary

Adds accurate agent availability detection and completes the assistant-first identity migration across the agent, team, conversation, channel, and cron modules.

Agent availability detection

Actively probe session/new to verify an agent can establish a session, instead of inferring reachability.
Distinguish needs_auth / auth-required from offline/unreachable, and clear needs_auth on session success.
Collapse the prior available/unavailable/needs_auth model into clearer online/offline reporting plus an explicit auth-required signal.
Gate aionrs availability on the resolved provider configuration rather than backend identity alone.
Resolve assistant agent_status by agent_type (with backend fallback) so bare/built-in assistants report the correct status.
Reap the probe process tree / kill the probe process group to stop orphaned wrapper grandchild leaks.
Expose a backend logo catalog endpoint and key the logo catalog by agent_type when backend is null.

Assistant-first identity migration

Persist and resolve assistant identity across team, conversation, channel, and cron flows; derive backends, avatars, names, and runtime types from assistants.
Reject legacy backend/custom-agent/preset id fields in public write DTOs; canonicalize legacy ids on read.
Bootstrap TeamRun for assistant-first creation and close the assistant-first guide handoff; reword guide prompts and MCP tools around assistants.
Prefer assistant runtime seeds and persisted ACP session identity when creating conversations.

Fixes

Resolve cron/assistant backends from snapshots and prefer runtime backend over stale extras.
Structure assistant-first errors for i18n.

Test

just push green: 6475 tests passed, lint/clippy and fmt clean.

Closes #499

…-testing-phase2

The availability scheduler runs `try_connect_custom_agent` every 5 minutes for every agent, spawning a CLI subprocess and tearing it down once the ACP handshake completes (or fails). For wrapper CLIs that fork a long-lived grandchild — `npm exec openclaw --acp` is the production case — cleanup was leaking the grandchild because: * `kill_on_drop(true)` on the tokio Command only signals the direct child (the npm exec wrapper), not its grandchild. * The probe relied on `drop(protocol)` for the success path and on no explicit cleanup for the handshake-fail path, so `proc.kill` was never called. * `CliAgentProcess::kill` itself short-circuited and returned Ok the moment the leader exited within the grace period — so even when callers did invoke it, no group-wide SIGKILL was sent. Result: dozens of zombie `openclaw-acp` processes accumulated per day under the 5-minute scheduler. Fix: 1. `CliAgentProcess::kill` now always issues a group-wide SIGKILL after the grace period, even when the leader has already exited. `force_kill` already maps ESRCH to success, so the sweep is idempotent for already-reaped trees. 2. `try_connect_custom_agent` calls `proc.kill` on every outcome (success, ACP failure, handshake timeout) by hoisting the spawn out of the inner future and running cleanup unconditionally after the timeout race resolves. 3. New regression test `probe_kills_grandchild_left_behind_by_wrapper` exercises the exact wrapper-grandchild shape from production and asserts the grandchild is reaped before the probe returns.

list_management_rows now sets env to Vec::new() instead of copying meta.env, which contained merged override secrets. The management row already exposes has_command_override + env_override_key_count; the UI does not need plaintext values. The e2e test now asserts the management row's env is empty AND that the secret value "sk-x" does not appear anywhere in the management response body.

Added record_session_success to AgentAvailabilityFeedbackPort trait and AgentAvailabilityService. It persists a session-kind available snapshot, which clears needs_auth and updates last_success_at. Turn orchestrator now calls record_agent_session_success when send_message succeeds, mirroring the existing session failure path. Test: record_session_success_clears_needs_auth verifies that a needs_auth state set via session failure is cleared by session success.

…ing-phase2

Bare assistants for single-engine agents (e.g. Aion CLI) were generated with an empty agent_backend because their engine identity lives in agent_type, not the ACP-vendor backend column. An empty preset_agent_type made the frontend route aionrs assistants as ACP, dropping the top-level model and persisting a NULL model that later failed warmup with "Provider '' not found". Reconcile now falls back to agent_type when backend is empty. Also fix pre-existing test breakage left from the assistant-first migration: - add missing agent_status/team_selectable/deletable to the AssistantResponse camelCase rejection fixture - update channel default-settings e2e to expect the generated aionrs bare assistant binding instead of a null assistant - drop the obsolete agent.select persist integration test (direct agent selection is no longer supported; covered by the unknown-action unit test) - add missing last_check_error_details to the bare assistant test row

…/offline Align the backend agent status model with the simplified frontend semantics: a probe only verifies an ACP handshake (reachability), not authorization, so the misleading "available"/"needs_auth" verdicts are dropped in favor of plain online/offline. - agent_discovery: rename AgentManagementStatus and AgentSnapshotCheckStatus variants Available/Unavailable/NeedsAuth -> Online/Offline (serde snake_case). - availability: simplify snapshot persistence — record_session_failure always yields offline; drop the success-recording path and auth detection. - registry: parse "online"/"offline"; derive management status as Offline -> Offline else Online. - turn_orchestrator: always report session_send_failed on send error; no success recording. - assistant/service + tests: migrate status values to online/offline.

…iders Deepen the agent health probe from `initialize` to `session/new` so it reflects real usability, not just protocol reachability — `initialize` returns authMethods even for authorized agents and cannot tell apart "reachable but not signed in". - custom_agent_probe: after `initialize`, open a throwaway `session/new` (no prompt); classify the outcome as Ok / Auth (ACP auth_required, JSON-RPC -32000) / Fail. Applies to both the custom and builtin-managed probe paths. - api-types: add `TryConnectCustomAgentResponse::FailAuth` (tag `fail_auth`). - availability: map FailAuth → offline + `auth_required` code; gate aionrs (built-in agent, no external CLI) availability on having at least one enabled model provider, mirroring AssistantService::resolve_default_agent_type — offline + `no_provider` otherwise. - custom: accept test-on-save when the agent is reachable but auth_required (a valid agent the user just hasn't logged into yet). - registry: add guidance for auth_required and no_provider error codes. - The background scheduler shares run_probe, so periodic checks reflect the same session/new-based status.

…-testing-phase2

…ficeAI/AionCore into feat/agent-connection-testing-phase2 * 'feat/agent-connection-testing-phase2' of github.com:iOfficeAI/AionCore:

…ackend An assistant's agent_status was matched to its agent row by `backend` only. aionrs (the built-in Rust agent) has a NULL `backend` and is keyed by `agent_type` ("aionrs"), so every aionrs-backed assistant failed to resolve a row and was mislabelled Missing/unavailable. Match the agent row on `backend == effective_backend` OR `agent_type.serde_name() == effective_backend`, so aionrs assistants resolve to the real aionrs row and reflect its actual status. Add a regression test covering an aionrs assistant (row with NULL backend, agent_type Aionrs, Online) resolving to Online instead of Missing.

…-testing-phase2

…-testing-phase2 # Conflicts: # crates/aionui-conversation/src/service.rs

zk added 30 commits June 15, 2026 17:57

feat(agent): add connection testing and bare assistant projection

66b487a

Merge remote-tracking branch 'origin/main' into feat/agent-connection…

3c742a9

…-testing-phase2

chore(assistant): remove unused preset id whitelist asset

9ced726

Merge remote-tracking branch 'origin/main' into feat/agent-connection…

645eff8

…-testing-phase2

feat(assistant): prioritize bare assistants on first bootstrap

5f10d94

merge: bring origin/main into feat/agent-connection-testing-phase2

8431999

test(assistant): cover bare assistant projection

6523326

feat(channel): add backend-owned channel settings API

6a040d9

chore: apply auto-fixes (fmt + clippy)

ab35b57

test(api): refresh assistant response fixture

24fb0ce

feat(channel): resolve bindings from assistants

6226104

feat(team): persist assistant identity across team flows

d821d4d

refactor(cron): persist assistant identity in cron config

c176bb8

feat(agent): probe managed builtin acp health

940e287

refactor(agent): drop legacy backend health check route

f14083f

refactor(cron): create conversations with assistant identity

4bdab9b

refactor(channel): normalize assistant bindings on write

7263515

feat(agent): surface management diagnostics guidance

987f8f1

fix(assistant): forbid editing generated assistants

742ba2b

refactor(team): persist assistant identity for new agents

f2e7975

refactor(conversation): inject assistant runtime seeds

86033c2

refactor(cron): strip legacy agent ids on assistant writes

907ace6

refactor(team): prefer assistant ids in mcp tooling

d6c8161

fix(team): resolve spawn backend from assistant ids

afb071e

refactor(team): derive team backends from assistants

6709f85

refactor(conversation): drop redundant preset extra writes

712d5fc

refactor(team): prefer assistant ids in leader prompts

dfd6d18

refactor(channel): create conversations through assistant identities

f71aa96

refactor(team): seed lead prompts from assistants

f5ebc0a

zk added 30 commits June 18, 2026 16:55

refactor(team): sync solo guide prompt with assistant-first copy

b2edf58

fix(agent): implement update_agent_overrides in test stubs

e11141b

fix(team): add override fields to AgentMetadataRow test constructors

48eecf0

refactor(channel): default to bare assistant bindings

1c22001

fix(team): close assistant-first guide handoff

0260b83

Merge branch 'feat/agent-self-repair' into feat/agent-connection-test…

ea7893b

…ing-phase2

fix(team): structure assistant-first errors for i18n

c4072a9

fix(cron): resolve assistant backend from snapshots

86e8d70

chore(merge): merge origin main into agent connection branch

a4f437e

Merge remote-tracking branch 'origin/main' into feat/agent-connection…

9f4d84a

…-testing-phase2

Merge remote-tracking branch 'origin/main' into feat/agent-connection…

8f170cc

…-testing-phase2

Merge branch 'feat/agent-connection-testing-phase2' of github.com:iOf…

5456a25

…ficeAI/AionCore into feat/agent-connection-testing-phase2 * 'feat/agent-connection-testing-phase2' of github.com:iOfficeAI/AionCore:

fix(team): bootstrap TeamRun for assistant-first creation

82f848e

chore: apply auto-fixes (fmt + clippy)

65159ae

fix(ci): stabilize agent availability checks

0eefd92

Merge remote-tracking branch 'origin/main' into feat/agent-connection…

5380220

…-testing-phase2

fix(assistant): unify assistant agent id storage

cee9c09

perf(agent): remove background availability probes

c74e32b

refactor(assistant): normalize assistant and cron identity

237eb92

Merge remote-tracking branch 'origin/main' into feat/agent-connection…

3cb4334

…-testing-phase2 # Conflicts: # crates/aionui-conversation/src/service.rs

chore: apply auto-fixes (fmt + clippy)

1b737a2

test(cron): align workspace e2e fixtures with assistant config

607a409

fix(assistant): include agent id in response projections

643be6b

chore: apply auto-fixes (fmt + clippy)

f50eb38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): detect availability via session/new probe and assistant-first identity#500

feat(agent): detect availability via session/new probe and assistant-first identity#500
kaizhou-lab wants to merge 139 commits into
mainfrom
feat/agent-connection-testing-phase2

kaizhou-lab commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kaizhou-lab commented Jun 22, 2026

Summary

Agent availability detection

Assistant-first identity migration

Fixes

Test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant