Skip to content

Refa : migrate agent webhook routes to REST APIs#14330

Merged
wangq8 merged 12 commits into
infiniflow:mainfrom
buua436:r007
Apr 24, 2026
Merged

Refa : migrate agent webhook routes to REST APIs#14330
wangq8 merged 12 commits into
infiniflow:mainfrom
buua436:r007

Conversation

@buua436
Copy link
Copy Markdown
Contributor

@buua436 buua436 commented Apr 23, 2026

What problem does this PR solve?

migrate agent webhook routes to REST APIs

Type of change

  • Refactoring

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Agent-scoped webhook endpoints moved into api/apps/restful_apis/agent_api.py, implementing request parsing, configurable security (IP allowlist, rate limiting, auth, body-size), canvas execution modes, Redis-backed test/log tracing, and frontend/tests updated for the new agent-scoped routes.

Changes

Cohort / File(s) Summary
REST webhook handlers
api/apps/restful_apis/agent_api.py
Adds /agents/{id}/webhook, /agents/{id}/webhook/test, and /agents/{id}/webhook/logs routes with method allowlists, body-size limits, IP allowlist, Redis token-bucket rate limiting, auth modes (none/token/basic/jwt with claim checks), request parsing (query/headers/body/files), schema validation, immediate vs streaming execution flows, and optional Redis-backed trace persistence.
Removed SDK webhook handlers
api/apps/sdk/agents.py
Removes legacy webhook and trace route handlers previously implemented in the SDK module.
Frontend URL updates
web/src/utils/api.ts, web/src/pages/agent/hooks/use-build-webhook-url.ts, web/src/pages/agent/webhook-sheet/index.tsx
Repoints webhook test and trace helpers and displayed webhook URLs to agent-scoped endpoints: /api/v1/agents/{id}/webhook/test and /api/v1/agents/{id}/webhook/logs.
Tests & scaffolding
test/testcases/test_web_api/test_agent_app/test_agents_webhook_unit.py, test/.../api.apps stubs
Updates tests to load REST module, switch to agent-scoped paths, add current_user and login_required test stubs, and ensure trace tests set tenant identity to exercise identity-dependent behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Client as HTTP Client
    participant API as Agent Webhook API
    participant Auth as Auth Validator
    participant Redis as Redis (rate-limit/logs)
    participant Canvas as Canvas Engine

    Client->>API: POST /api/v1/agents/{id}/webhook\n(payload & headers)
    API->>API: Load agent canvas & find begin component
    API->>API: Validate HTTP method & body size
    API->>Auth: Authenticate (none/token/basic/JWT)
    Auth-->>API: Auth result
    API->>Redis: Check token-bucket rate limit
    Redis-->>API: Rate-limit verdict
    API->>API: Validate IP allowlist
    API->>API: Parse request (query, headers, body, files)
    API->>Canvas: canvas.run(webhook_payload)
    Canvas-->>API: Execution events / result
    alt Immediate Mode
        API->>Redis: Log execution events
        API-->>Client: Return configured status/body (async)
    else Streaming Mode
        API-->>Client: Aggregate & return JSON result (sync)
    end

    Client->>API: GET /api/v1/agents/{id}/webhook/logs?since_ts=...&webhook_id=...
    API->>Redis: Retrieve incremental trace events
    Redis-->>API: Events
    API-->>Client: Incremental trace payload
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

size:XL

Suggested reviewers

  • wangq8
  • dcc123456

Poem

🐰 I hopped through routes both old and new,
Agent webhooks dressed in cleaner hue,
Guards and Redis hum in tidy rows,
Canvas wakes and off the workflow goes,
I nibble bytes and bounce — a tiny chew!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description is minimal and vague, providing only the title rephrased and a refactoring label without substantive context about what problem this migration solves or why it was necessary. Expand the description to explain the rationale for moving webhooks to REST APIs, how this benefits the codebase, and any implementation notes reviewers should know.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change: migrating agent webhook routes from SDK to REST APIs, as evidenced by the deletion of webhooks from sdk/agents.py and their introduction in restful_apis/agent_api.py.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@buua436 buua436 marked this pull request as ready for review April 23, 2026 09:54
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. ☯️ refactor Pull request that refactor/refine code 🌈 python Pull requests that update Python code 🐖api The modified files are located under directory 'api/apps/sdk' 🧰 typescript Pull requests that update Typescript code labels Apr 23, 2026
@buua436 buua436 added the ci Continue Integration label Apr 23, 2026
@buua436 buua436 marked this pull request as draft April 23, 2026 09:55
@buua436 buua436 marked this pull request as ready for review April 23, 2026 09:56
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (1)
api/apps/restful_apis/agent_api.py (1)

40-42: Add lightweight logs for the new webhook REST flow.

Only background exceptions are logged. Add structured info/warning logs for route entry, validation rejections, auth/ownership denial, and trace polling failures without logging payloads, tokens, or secrets. As per coding guidelines, **/*.py: Add logging for new flows.

Also applies to: 717-720

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/agent_api.py` around lines 40 - 42, Add structured,
non-sensitive logging to the webhook REST flow in the webhook(agent_id: str)
handler: emit an info log on route entry (include agent_id and a short route
label, using request.path but never the payload), warn/info when validation
rejects input (reference where validations occur), warn when auth/ownership is
denied (reference the auth/ownership check code paths), and warn/error when
trace polling or poll result handling fails (reference the trace polling logic).
Use structured fields for agent_id/trace_id/status but do NOT log request
bodies, tokens, headers, or secrets; keep logs at appropriate levels (info for
entry, warning for validation/auth failures, error for polling exceptions).
Ensure similar logging is added to the corresponding trace-polling code paths
noted near lines 717-720.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/apps/restful_apis/agent_api.py`:
- Around line 664-707: The synchronous path currently calls append_webhook_trace
(with agent_id, start_ts, ans and later with finished/error events) directly
which can raise Redis/JSON errors and override a valid response; update the
synchronous branch in the try and except blocks to mirror the background path by
wrapping each append_webhook_trace call in a try/except that swallows/logs
trace-specific errors so they don't propagate (i.e., around the
append_webhook_trace calls in the successful path and in the exception handler
use a guarded call that catches exceptions and does not re-raise).
- Around line 727-733: The code currently hardcodes the HMAC key as
WEBHOOK_ID_SECRET and derives webhook IDs with hmac.new(...); replace the
literal by reading the secret from configuration/secret storage (e.g.,
application config or environment variable) inside the function that generates
the signature (the block using WEBHOOK_ID_SECRET and hmac.new), fail fast with a
clear error if the secret is missing, and use that configured value in tests to
compute the expected ID rather than asserting against the literal; keep the same
base64.urlsafe_b64encode(...).decode(...).rstrip("=") output behavior so
existing callers are unaffected.
- Around line 299-359: The parse_webhook_request async function currently
catches all exceptions and replaces any parse error (including JSON decode
errors, "Too many uploaded files", form/file read failures, or content-type
mismatches) with an empty dict, which hides failures; change it so parsing
errors are propagated (or returned as a structured error) instead of being
swallowed: remove or narrow the broad except Exception block in
parse_webhook_request and let real exceptions from request.get_json(),
request.form, request.files, FileService.upload_info, canvas.get_files_async,
and json.loads bubble up (or re-raise them with context), and ensure callers of
parse_webhook_request catch these errors and convert content-type mismatches or
parse exceptions into proper 400 responses (wrap the parse_webhook_request call
and return a 400 with the error message for ValueError/content-type errors and
other parse errors, rather than proceeding with an empty body).
- Around line 111-133: The _validate_max_body_size function currently treats a
missing Content-Length as 0 which allows chunked or omitted-length uploads to
bypass limits; update _validate_max_body_size to reject requests with missing or
undefined request.content_length when security_cfg defines max_body_size (raise
a clear error like "Missing Content-Length for bounded request") or
alternatively implement/enforce a streaming read limit when the body is actually
consumed (add a capped stream reader used by your webhook/handler code so reads
abort if bytes exceed limit). Make the change inside _validate_max_body_size
(and ensure any body-parsing paths use the same capped reader) so the request
cannot bypass the max_body_size by omitting Content-Length or using chunked
transfer.
- Around line 613-614: The canvas update is using the wrong
identifier—UserCanvasService.update_by_id is called with cvs.user_id, which can
persist the DSL to the wrong record; change the call to update using the agent
identifier (e.g., cvs.agent_id) or the service method that updates by agent
(e.g., UserCanvasService.update_by_agent_id) so that the updated DSL (cvs.dsl)
is saved against the agent's canvas record rather than the tenant/user id.
- Around line 61-64: The current loop directly indexes nested DSL fields
(components[k]["obj"]["component_name"], ["params"]["mode"]) which can raise
KeyError/TypeError for malformed agent definitions; update the traversal around
components and cpn_obj to safely access keys (use dict.get and isinstance checks
or try/except) before reading "component_name" and "params", and if required
keys are missing or not the expected types return a controlled bad-request
response (HTTP 400) instead of letting a 500 be raised; specifically adjust the
code that sets cpn_obj, checks cpn_obj["component_name"].lower() and
cpn_obj["params"]["mode"], and set/return webhook_cfg only after validation.

In `@test/testcases/test_web_api/test_agent_app/test_agents_webhook_unit.py`:
- Around line 311-317: The test currently replaces api.apps.login_required with
a no-op (_identity_decorator) which defeats the new auth protection for the
webhook_trace endpoint; replace that stub with an enforcing decorator that
raises or rejects when api_apps_pkg.current_user is missing/unauthenticated
(reference symbols: api_apps_pkg, login_required, current_user, webhook_trace)
and update the test to include an unauthenticated request case that asserts
access is denied (e.g., 401/403) as well as the existing authenticated case to
ensure the decorator is actually applied.

---

Nitpick comments:
In `@api/apps/restful_apis/agent_api.py`:
- Around line 40-42: Add structured, non-sensitive logging to the webhook REST
flow in the webhook(agent_id: str) handler: emit an info log on route entry
(include agent_id and a short route label, using request.path but never the
payload), warn/info when validation rejects input (reference where validations
occur), warn when auth/ownership is denied (reference the auth/ownership check
code paths), and warn/error when trace polling or poll result handling fails
(reference the trace polling logic). Use structured fields for
agent_id/trace_id/status but do NOT log request bodies, tokens, headers, or
secrets; keep logs at appropriate levels (info for entry, warning for
validation/auth failures, error for polling exceptions). Ensure similar logging
is added to the corresponding trace-polling code paths noted near lines 717-720.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ed88ff7a-41f8-45fc-993e-66704a51b688

📥 Commits

Reviewing files that changed from the base of the PR and between ba47c13 and 4b20459.

📒 Files selected for processing (6)
  • api/apps/restful_apis/agent_api.py
  • api/apps/sdk/agents.py
  • test/testcases/test_web_api/test_agent_app/test_agents_webhook_unit.py
  • web/src/pages/agent/hooks/use-build-webhook-url.ts
  • web/src/pages/agent/webhook-sheet/index.tsx
  • web/src/utils/api.ts
💤 Files with no reviewable changes (1)
  • api/apps/sdk/agents.py

Comment on lines +61 to +64
for k, _ in components.items():
cpn_obj = components[k]["obj"]
if cpn_obj["component_name"].lower() == "begin" and cpn_obj["params"]["mode"] == "Webhook":
webhook_cfg = cpn_obj["params"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle malformed DSL components without raising a 500.

Direct indexing into nested DSL keys can raise KeyError for a partial/corrupt agent definition. Return a controlled bad-request response instead.

🛡️ Proposed safer DSL traversal
-    for k, _ in components.items():
-        cpn_obj = components[k]["obj"]
-        if cpn_obj["component_name"].lower() == "begin" and cpn_obj["params"]["mode"] == "Webhook":
-            webhook_cfg = cpn_obj["params"]
+    for component in components.values():
+        cpn_obj = component.get("obj", {}) if isinstance(component, dict) else {}
+        params = cpn_obj.get("params", {})
+        if cpn_obj.get("component_name", "").lower() == "begin" and params.get("mode") == "Webhook":
+            webhook_cfg = params
+            break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/agent_api.py` around lines 61 - 64, The current loop
directly indexes nested DSL fields (components[k]["obj"]["component_name"],
["params"]["mode"]) which can raise KeyError/TypeError for malformed agent
definitions; update the traversal around components and cpn_obj to safely access
keys (use dict.get and isinstance checks or try/except) before reading
"component_name" and "params", and if required keys are missing or not the
expected types return a controlled bad-request response (HTTP 400) instead of
letting a 500 be raised; specifically adjust the code that sets cpn_obj, checks
cpn_obj["component_name"].lower() and cpn_obj["params"]["mode"], and set/return
webhook_cfg only after validation.

Comment on lines +111 to +133
async def _validate_max_body_size(security_cfg):
"""Check request size does not exceed max_body_size."""
max_size = security_cfg.get("max_body_size")
if not max_size:
return

# Convert "10MB" → bytes
units = {"kb": 1024, "mb": 1024**2}
size_str = max_size.lower()

for suffix, factor in units.items():
if size_str.endswith(suffix):
limit = int(size_str.replace(suffix, "")) * factor
break
else:
raise Exception("Invalid max_body_size format")
MAX_LIMIT = 10 * 1024 * 1024 # 10MB
if limit > MAX_LIMIT:
raise Exception("max_body_size exceeds maximum allowed size (10MB)")

content_length = request.content_length or 0
if content_length > limit:
raise Exception(f"Request body too large: {content_length} > {limit}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t trust Content-Length as the only body-size guard.

If a client omits Content-Length or uses chunked upload, line 131 treats the size as 0, then later reads the body/files. Reject missing length for bounded webhooks or enforce the cap while reading/parsing.

🛡️ Minimal defensive check
-        content_length = request.content_length or 0
+        content_length = request.content_length
+        if content_length is None:
+            raise Exception("Missing Content-Length")
         if content_length > limit:
             raise Exception(f"Request body too large: {content_length} > {limit}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/agent_api.py` around lines 111 - 133, The
_validate_max_body_size function currently treats a missing Content-Length as 0
which allows chunked or omitted-length uploads to bypass limits; update
_validate_max_body_size to reject requests with missing or undefined
request.content_length when security_cfg defines max_body_size (raise a clear
error like "Missing Content-Length for bounded request") or alternatively
implement/enforce a streaming read limit when the body is actually consumed (add
a capped stream reader used by your webhook/handler code so reads abort if bytes
exceed limit). Make the change inside _validate_max_body_size (and ensure any
body-parsing paths use the same capped reader) so the request cannot bypass the
max_body_size by omitting Content-Length or using chunked transfer.

Comment on lines +299 to +359
async def parse_webhook_request(content_type):
"""Parse request based on content-type and return structured data."""

# 1. Query
query_data = {k: v for k, v in request.args.items()}

# 2. Headers
header_data = {k: v for k, v in request.headers.items()}

# 3. Body
ctype = request.headers.get("Content-Type", "").split(";")[0].strip()
if ctype and ctype != content_type:
raise ValueError(
f"Invalid Content-Type: expect '{content_type}', got '{ctype}'"
)

body_data: dict = {}

try:
if ctype == "application/json":
body_data = await request.get_json() or {}

elif ctype == "multipart/form-data":
nonlocal canvas
form = await request.form
files = await request.files

body_data = {}

for key, value in form.items():
body_data[key] = value

if len(files) > 10:
raise Exception("Too many uploaded files")
for key, file in files.items():
desc = FileService.upload_info(
cvs.user_id, # user
file, # FileStorage
None # url (None for webhook)
)
file_parsed= await canvas.get_files_async([desc])
body_data[key] = file_parsed

elif ctype == "application/x-www-form-urlencoded":
form = await request.form
body_data = dict(form)

else:
# text/plain / octet-stream / empty / unknown
raw = await request.get_data()
if raw:
try:
body_data = json.loads(raw.decode("utf-8"))
except Exception:
body_data = {}
else:
body_data = {}

except Exception:
body_data = {}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve parse failures instead of converting them to {}.

The broad except swallows “Too many uploaded files”, upload failures, form read errors, and JSON parse errors, allowing the webhook to continue with an empty body. Also wrap the parse_webhook_request call so content-type mismatches return a 400 instead of escaping.

🛠️ Proposed parse-error handling shape
-        try:
-            if ctype == "application/json":
-                body_data = await request.get_json() or {}
+        if ctype == "application/json":
+            body_data = await request.get_json() or {}
 
-            elif ctype == "multipart/form-data":
-                nonlocal canvas
-                form = await request.form
-                files = await request.files
+        elif ctype == "multipart/form-data":
+            nonlocal canvas
+            form = await request.form
+            files = await request.files
 
-                body_data = {}
+            body_data = {}
 
-                for key, value in form.items():
-                    body_data[key] = value
+            for key, value in form.items():
+                body_data[key] = value
 
-                if len(files) > 10:
-                    raise Exception("Too many uploaded files")
-                for key, file in files.items():
-                    desc = FileService.upload_info(
-                        cvs.user_id,           # user
-                        file,              # FileStorage
-                        None                   # url (None for webhook)
-                    )
-                    file_parsed= await canvas.get_files_async([desc])
-                    body_data[key] = file_parsed
+            if len(files) > 10:
+                raise Exception("Too many uploaded files")
+            for key, file in files.items():
+                desc = FileService.upload_info(cvs.user_id, file, None)
+                file_parsed = await canvas.get_files_async([desc])
+                body_data[key] = file_parsed
 
-            elif ctype == "application/x-www-form-urlencoded":
-                form = await request.form
-                body_data = dict(form)
+        elif ctype == "application/x-www-form-urlencoded":
+            form = await request.form
+            body_data = dict(form)
 
-            else:
-                # text/plain / octet-stream / empty / unknown
+        else:
+            # text/plain / octet-stream / empty / unknown
+            try:
                 raw = await request.get_data()
                 if raw:
                     try:
                         body_data = json.loads(raw.decode("utf-8"))
                     except Exception:
                         body_data = {}
                 else:
                     body_data = {}
-
-        except Exception:
-            body_data = {}
+            except Exception:
+                body_data = {}
-    parsed = await parse_webhook_request(webhook_cfg.get("content_types"))
+    try:
+        parsed = await parse_webhook_request(webhook_cfg.get("content_types"))
+    except Exception as e:
+        return get_data_error_result(code=RetCode.BAD_REQUEST, message=str(e)), RetCode.BAD_REQUEST

Also applies to: 523-532

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/agent_api.py` around lines 299 - 359, The
parse_webhook_request async function currently catches all exceptions and
replaces any parse error (including JSON decode errors, "Too many uploaded
files", form/file read failures, or content-type mismatches) with an empty dict,
which hides failures; change it so parsing errors are propagated (or returned as
a structured error) instead of being swallowed: remove or narrow the broad
except Exception block in parse_webhook_request and let real exceptions from
request.get_json(), request.form, request.files, FileService.upload_info,
canvas.get_files_async, and json.loads bubble up (or re-raise them with
context), and ensure callers of parse_webhook_request catch these errors and
convert content-type mismatches or parse exceptions into proper 400 responses
(wrap the parse_webhook_request call and return a 400 with the error message for
ValueError/content-type errors and other parse errors, rather than proceeding
with an empty body).

Comment on lines +664 to +707
if is_test:
append_webhook_trace(
agent_id,
start_ts,
ans
)
if is_test:
append_webhook_trace(
agent_id,
start_ts,
{
"event": "finished",
"elapsed_time": time.time() - start_ts,
"success": True,
}
)
final_content = "".join(contents)
return {
"message": final_content,
"success": True,
"code": status,
}

except Exception as e:
if is_test:
append_webhook_trace(
agent_id,
start_ts,
{
"event": "error",
"message": str(e),
"error_type": type(e).__name__,
}
)
append_webhook_trace(
agent_id,
start_ts,
{
"event": "finished",
"elapsed_time": time.time() - start_ts,
"success": False,
}
)
return {"code": 400, "message": str(e),"success":False}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard trace writes in the synchronous execution path.

In this branch, Redis/JSON errors from append_webhook_trace can replace an otherwise valid webhook response with an internal failure, including inside the exception handler. Mirror the background path’s guarded trace logging.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/agent_api.py` around lines 664 - 707, The synchronous
path currently calls append_webhook_trace (with agent_id, start_ts, ans and
later with finished/error events) directly which can raise Redis/JSON errors and
override a valid response; update the synchronous branch in the try and except
blocks to mirror the background path by wrapping each append_webhook_trace call
in a try/except that swallows/logs trace-specific errors so they don't propagate
(i.e., around the append_webhook_trace calls in the successful path and in the
exception handler use a guarded call that catches exceptions and does not
re-raise).

Comment on lines +727 to +733
WEBHOOK_ID_SECRET = "webhook_id_secret"
sig = hmac.new(
WEBHOOK_ID_SECRET.encode("utf-8"),
start_ts.encode("utf-8"),
hashlib.sha256,
).digest()
return base64.urlsafe_b64encode(sig).decode("utf-8").rstrip("=")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Move the webhook-id signing secret out of source code.

A fixed HMAC key is shared by every deployment and cannot be rotated. Load it from configuration/secret storage, and have tests derive the expected ID from that configured value instead of the literal.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/agent_api.py` around lines 727 - 733, The code
currently hardcodes the HMAC key as WEBHOOK_ID_SECRET and derives webhook IDs
with hmac.new(...); replace the literal by reading the secret from
configuration/secret storage (e.g., application config or environment variable)
inside the function that generates the signature (the block using
WEBHOOK_ID_SECRET and hmac.new), fail fast with a clear error if the secret is
missing, and use that configured value in tests to compute the expected ID
rather than asserting against the literal; keep the same
base64.urlsafe_b64encode(...).decode(...).rstrip("=") output behavior so
existing callers are unaffected.

Comment on lines +311 to 317
api_apps_pkg.current_user = SimpleNamespace(id="tenant-1")

def _identity_decorator(func):
return func

api_apps_pkg.login_required = _identity_decorator
monkeypatch.setitem(sys.modules, "api.apps", api_apps_pkg)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t stub login_required as a no-op for the protected logs endpoint.

webhook_trace is now auth-protected, but this identity decorator lets tests pass even if authentication is accidentally removed or bypassed. Use an enforcing stub and add an unauthenticated case.

🧪 Proposed test-stub adjustment
 api_apps_pkg.current_user = SimpleNamespace(id="tenant-1")
 
-def _identity_decorator(func):
-    return func
+def _login_required(func):
+    async def _wrapper(*args, **kwargs):
+        if not api_apps_pkg.current_user:
+            raise PermissionError("unauthorized")
+        return await func(*args, **kwargs)
+
+    return _wrapper
 
-api_apps_pkg.login_required = _identity_decorator
+api_apps_pkg.login_required = _login_required

Based on learnings, Applies to tests/**/*.py : Add/adjust tests for behavior changes.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
api_apps_pkg.current_user = SimpleNamespace(id="tenant-1")
def _identity_decorator(func):
return func
api_apps_pkg.login_required = _identity_decorator
monkeypatch.setitem(sys.modules, "api.apps", api_apps_pkg)
api_apps_pkg.current_user = SimpleNamespace(id="tenant-1")
def _login_required(func):
async def _wrapper(*args, **kwargs):
if not api_apps_pkg.current_user:
raise PermissionError("unauthorized")
return await func(*args, **kwargs)
return _wrapper
api_apps_pkg.login_required = _login_required
monkeypatch.setitem(sys.modules, "api.apps", api_apps_pkg)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/testcases/test_web_api/test_agent_app/test_agents_webhook_unit.py`
around lines 311 - 317, The test currently replaces api.apps.login_required with
a no-op (_identity_decorator) which defeats the new auth protection for the
webhook_trace endpoint; replace that stub with an enforcing decorator that
raises or rejects when api_apps_pkg.current_user is missing/unauthenticated
(reference symbols: api_apps_pkg, login_required, current_user, webhook_trace)
and update the test to include an unauthenticated request case that asserts
access is denied (e.g., 401/403) as well as the existing authenticated case to
ensure the decorator is actually applied.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.30%. Comparing base (d84438f) to head (dd78d8d).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #14330      +/-   ##
==========================================
- Coverage   97.41%   95.30%   -2.11%     
==========================================
  Files          10       10              
  Lines         695      703       +8     
  Branches      111      112       +1     
==========================================
- Hits          677      670       -7     
- Misses          7       16       +9     
- Partials       11       17       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (4)
api/apps/restful_apis/agent_api.py (4)

70-71: Remove duplicate import.

current_user, login_required is already imported at line 35.

♻️ Proposed cleanup
-from api.apps import current_user, login_required
-
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/agent_api.py` around lines 70 - 71, Remove the
duplicate import of current_user and login_required from the second import
statement so there is only one import of these symbols; locate the redundant
line that reads "from api.apps import current_user, login_required" (the
duplicate near the top) and delete it, leaving the original import (the one at
line ~35) intact to avoid repeated imports.

1062-1062: Fragile test-path detection via hardcoded URL prefix.

/api/v1/... is a deployment-level prefix (blueprint url_prefix). If the prefix ever changes or the app is mounted under a different base path, is_test silently becomes False for all /webhook/test calls, which means test traces stop being persisted without any error.

Prefer splitting the two routes into distinct handlers (or using request.url_rule / an explicit path suffix check) instead of matching the absolute path string.

♻️ Example: rule-based detection
-    is_test = request.path.startswith(f"/api/v1/agents/{agent_id}/webhook/test")
+    rule = getattr(request.url_rule, "rule", "") or ""
+    is_test = rule.endswith("/webhook/test")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/agent_api.py` at line 1062, The current fragile
detection sets is_test using request.path.startswith(...) which depends on a
deployment-level prefix; update the logic to detect the webhook-test route
reliably by using request.url_rule (or checking the path suffix) or by splitting
into separate route handlers so detection doesn't rely on a hardcoded "/api/v1"
prefix; specifically, replace the request.path.startswith check for is_test in
agent_api.py with a rule-based check using request.url_rule.rule or an endswith
comparison against f"/agents/{agent_id}/webhook/test", or move the /webhook/test
route into its own handler so you can set is_test deterministically without
referencing request.path directly.

1756-1760: Minor robustness gaps in webhook_trace.

  • decode_webhook_id compares encoded IDs with ==; since the endpoint is behind @login_required this is a low-risk finding, but hmac.compare_digest is a drop-in, safer replacement.
  • json.loads(raw) at line 1788 is unprotected: a stale/corrupt Redis value returns 500. Wrap in try/except and treat as "no logs yet".
  • Per coding guidelines, add a log line when the key is missing or the payload fails to parse so operators can diagnose.

Also applies to: 1788-1788

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/agent_api.py` around lines 1756 - 1760, Replace the
equality check in decode_webhook_id with a constant-time comparison (use
hmac.compare_digest(encode_webhook_id(ts), enc_id)) to avoid timing attacks; in
webhook_trace where you call json.loads(raw) wrap the load in try/except
(JSONDecodeError/ValueError) and treat a parse failure as "no logs yet" instead
of raising 500; also add explicit logging when the Redis key is missing or when
parsing fails so operators can diagnose (use the module logger, e.g., logger or
current_app.logger) and keep existing behavior of returning None/"no logs yet"
in those cases.

1281-1301: Use PyJWT's built-in require option for required-claim validation.

PyJWT supports required-claim enforcement via options={"require": [...]}, which handles claim-presence validation and raises MissingRequiredClaimError for missing claims. The current hand-rolled check unnecessarily restricts reserved claims like exp (lines 1294–1297), preventing a common and valid use case where you want to enforce expiration checks alongside other claim requirements.

♻️ Suggested simplification
-        raw_required_claims = jwt_cfg.get("required_claims", [])
-        if isinstance(raw_required_claims, str):
-            required_claims = [raw_required_claims]
-        elif isinstance(raw_required_claims, (list, tuple, set)):
-            required_claims = list(raw_required_claims)
-        else:
-            required_claims = []
-
-        required_claims = [
-            c for c in required_claims
-            if isinstance(c, str) and c.strip()
-        ]
-
-        RESERVED_CLAIMS = {"exp", "sub", "aud", "iss", "nbf", "iat"}
-        for claim in required_claims:
-            if claim in RESERVED_CLAIMS:
-                raise Exception(f"Reserved JWT claim cannot be required: {claim}")
-
-        for claim in required_claims:
-            if claim not in decoded:
-                raise Exception(f"Missing JWT claim: {claim}")
+        raw_required_claims = jwt_cfg.get("required_claims", []) or []
+        if isinstance(raw_required_claims, str):
+            raw_required_claims = [raw_required_claims]
+        required_claims = [c for c in raw_required_claims if isinstance(c, str) and c.strip()]
+        options["require"] = required_claims

(Move the options["require"] assignment before the jwt.decode call and remove the manual validation loops.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/agent_api.py` around lines 1281 - 1301, The custom
required-claim validation using required_claims and RESERVED_CLAIMS should be
removed and replaced by using PyJWT's built-in require option: extract
required_claims from jwt_cfg (as you already do), set options["require"] =
required_claims before calling jwt.decode, and then remove the loops referencing
RESERVED_CLAIMS and the manual presence check against decoded (the
exception-raising lines that check for reserved claims and missing claims).
Ensure jwt.decode is invoked so PyJWT can raise MissingRequiredClaimError when a
required claim is absent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/apps/restful_apis/agent_api.py`:
- Around line 1162-1175: The whitelist check uses request.remote_addr
(client_ip) which ignores proxy headers, doesn't validate/parses the IP
(ipaddress.ip_address can raise ValueError) and omits logging; update the logic
in agent_api.py to: extract the real client IP by honoring trusted proxy headers
(check X-Forwarded-For first, fall back to X-Real-IP then request.remote_addr),
validate the extracted client_ip with ipaddress.ip_address inside a try/except
to handle invalid or None values (return a clear deny and log the parse error),
perform the CIDR or single-IP comparisons using ipaddress.ip_network/ip_address
as before, and add a structured log entry (including the denied client_ip and
matched whitelist) before raising the Exception when access is denied.
- Around line 1330-1334: The code in parse_webhook_request is comparing the
request Content-Type string (ctype) against a variable named content_type which
may actually be a list from webhook_cfg.get("content_types"), causing every
request to fail; change the logic inside parse_webhook_request (and the other
occurrence around the second check) to normalize the incoming expected types
into an iterable (e.g., if content_type is a str wrap it in [content_type], or
if None use []), then check membership (if content_types and ctype not in
content_types: raise ValueError(...)) and update the error message to show the
allowed list (e.g., "expect one of [...] got '...'" ), referencing
parse_webhook_request, the local variable ctype, and the
webhook_cfg.get("content_types") usage to locate the fixes.
- Around line 1683-1706: The canvas-derived status parsing should be hardened:
when extracting status from ans["data"] in the message loop (the status variable
set via status = int(ans["data"].get("status", status))), catch conversion
errors and validate/clamp the value into the HTTP range (e.g., ensure an int
between 100 and 599, defaulting to a safe code like 500 if invalid); replace
direct use of the raw status in Response(..., status=result["code"], ...) with
the clamped/validated status (use the same clamp routine where result["code"] is
computed). Also add logging on the synchronous failure path (the except/failure
block previously returning the 400 payload) to include agent_id and elapsed time
(and any error details) using the existing logger, and ensure
append_webhook_trace calls (append_webhook_trace, start_ts) are preserved. Use
function/variable names ans, status, append_webhook_trace, final_content,
agent_id, start_ts, and result["code"] to locate and update the spots.
- Around line 1091-1096: The current logic uses webhook_cfg.get("methods", [])
which makes allowed_methods empty and thus permits all HTTP verbs; update the
handling of allowed_methods in the webhook validation (reference symbols:
webhook_cfg, allowed_methods, request_method, get_data_error_result,
RetCode.BAD_REQUEST) to enforce a safe default or fail-fast: either default
allowed_methods to a restrictive set like ["POST"] when methods is
missing/empty, or treat a missing/empty methods value as invalid and return a
BAD_REQUEST error indicating methods must be specified; ensure the subsequent
check (if allowed_methods and request_method not in allowed_methods) uses the
new non-empty allowed_methods so unintended verbs are rejected.
- Around line 1138-1150: The size-string parsing for max_size (variables
size_str, units, limit, MAX_LIMIT) is fragile: change suffix removal to slice
(size_str[:-len(suffix)]) instead of replace, expand units to include b, gb and
accept bare integers, parse the numeric part as float then cast to int to allow
fractional sizes (e.g., int(float(...)*factor)), validate that limit is >0 and
<= MAX_LIMIT (10*1024*1024) and raise a clear ValueError on invalid formats or
out-of-range values, and avoid parsing on every request by moving this logic to
precompute/cache the parsed limit during DSL validation or per-canvas
initialization (store the parsed limit for reuse).
- Around line 1310-1317: The Canvas constructor is being given cvs.user_id
(canvas owner) instead of the tenant_id expected by Canvas, causing billing/KB
attribution errors; in webhook() locate the tenant_id from the request context
(for example from authentication data, request.user.tenant_id, a specific
header, or the webhook configuration available in the function) and pass that
tenant_id as the second argument to Canvas instead of cvs.user_id (keep
canvas_id=agent_id as before); update any variable names accordingly so
Canvas(dsl, tenant_id, agent_id, canvas_id=agent_id) is used and ensure proper
error handling if tenant_id cannot be resolved.
- Around line 1662-1663: The background_run() task is created without a strong
reference and can be garbage-collected; change the creation site where
asyncio.create_task(background_run()) is called so you keep a strong reference
(e.g., create a module-level set like BACKGROUND_TASKS or attach a set to the
app), assign the task to a variable (task =
asyncio.create_task(background_run())), add it to that set, and register a done
callback to remove it when finished (task.add_done_callback(lambda t:
BACKGROUND_TASKS.discard(t))) so the long-running background_run coroutine is
retained for its lifetime.
- Around line 1219-1237: In _validate_token_auth: ensure token config exists by
checking token_cfg.get("token_header") and token_cfg.get("token_value") and
raise a clear Exception like "Token auth not configured" if missing, then fetch
provided header and compare using hmac.compare_digest(provided, token_value)
instead of !=; log an authentication failure (without secrets) and raise on
mismatch. In _validate_basic_auth: use hmac.compare_digest for both
auth.username vs username and auth.password vs password, log failures (no secret
data) and raise on mismatch; ensure you import hmac and use these functions in
_validate_token_auth and _validate_basic_auth to mitigate timing attacks and add
the configuration presence check to prevent silent bypass.

---

Nitpick comments:
In `@api/apps/restful_apis/agent_api.py`:
- Around line 70-71: Remove the duplicate import of current_user and
login_required from the second import statement so there is only one import of
these symbols; locate the redundant line that reads "from api.apps import
current_user, login_required" (the duplicate near the top) and delete it,
leaving the original import (the one at line ~35) intact to avoid repeated
imports.
- Line 1062: The current fragile detection sets is_test using
request.path.startswith(...) which depends on a deployment-level prefix; update
the logic to detect the webhook-test route reliably by using request.url_rule
(or checking the path suffix) or by splitting into separate route handlers so
detection doesn't rely on a hardcoded "/api/v1" prefix; specifically, replace
the request.path.startswith check for is_test in agent_api.py with a rule-based
check using request.url_rule.rule or an endswith comparison against
f"/agents/{agent_id}/webhook/test", or move the /webhook/test route into its own
handler so you can set is_test deterministically without referencing
request.path directly.
- Around line 1756-1760: Replace the equality check in decode_webhook_id with a
constant-time comparison (use hmac.compare_digest(encode_webhook_id(ts),
enc_id)) to avoid timing attacks; in webhook_trace where you call
json.loads(raw) wrap the load in try/except (JSONDecodeError/ValueError) and
treat a parse failure as "no logs yet" instead of raising 500; also add explicit
logging when the Redis key is missing or when parsing fails so operators can
diagnose (use the module logger, e.g., logger or current_app.logger) and keep
existing behavior of returning None/"no logs yet" in those cases.
- Around line 1281-1301: The custom required-claim validation using
required_claims and RESERVED_CLAIMS should be removed and replaced by using
PyJWT's built-in require option: extract required_claims from jwt_cfg (as you
already do), set options["require"] = required_claims before calling jwt.decode,
and then remove the loops referencing RESERVED_CLAIMS and the manual presence
check against decoded (the exception-raising lines that check for reserved
claims and missing claims). Ensure jwt.decode is invoked so PyJWT can raise
MissingRequiredClaimError when a required claim is absent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2a303bf2-e11d-4819-8a1e-3fbeff0570b9

📥 Commits

Reviewing files that changed from the base of the PR and between ba79906 and 4e203a4.

📒 Files selected for processing (3)
  • api/apps/restful_apis/agent_api.py
  • api/apps/sdk/agents.py
  • web/src/utils/api.ts
💤 Files with no reviewable changes (1)
  • api/apps/sdk/agents.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • web/src/utils/api.ts

Comment on lines +1091 to +1096
allowed_methods = webhook_cfg.get("methods", [])
request_method = request.method.upper()
if allowed_methods and request_method not in allowed_methods:
return get_data_error_result(
code=RetCode.BAD_REQUEST,message=f"HTTP method '{request_method}' not allowed for this webhook."
),RetCode.BAD_REQUEST
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Empty methods list silently allows every HTTP verb.

webhook_cfg.get("methods", []) defaults to [], and the check at line 1093 is if allowed_methods and request_method not in allowed_methods. When methods is absent or empty, the webhook accepts GET/DELETE/HEAD/etc. via the route declaration at line 1059, which exposes a larger surface than intended. Consider defaulting to a safe set (e.g. ["POST"]) or rejecting when methods is unspecified.

🛡️ Proposed default-deny
-    allowed_methods = webhook_cfg.get("methods", [])
+    allowed_methods = webhook_cfg.get("methods") or ["POST"]
     request_method = request.method.upper()
-    if allowed_methods and request_method not in allowed_methods:
+    allowed_methods = [m.upper() for m in allowed_methods]
+    if request_method not in allowed_methods:
         return get_data_error_result(
             code=RetCode.BAD_REQUEST,message=f"HTTP method '{request_method}' not allowed for this webhook."
         ),RetCode.BAD_REQUEST
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/agent_api.py` around lines 1091 - 1096, The current
logic uses webhook_cfg.get("methods", []) which makes allowed_methods empty and
thus permits all HTTP verbs; update the handling of allowed_methods in the
webhook validation (reference symbols: webhook_cfg, allowed_methods,
request_method, get_data_error_result, RetCode.BAD_REQUEST) to enforce a safe
default or fail-fast: either default allowed_methods to a restrictive set like
["POST"] when methods is missing/empty, or treat a missing/empty methods value
as invalid and return a BAD_REQUEST error indicating methods must be specified;
ensure the subsequent check (if allowed_methods and request_method not in
allowed_methods) uses the new non-empty allowed_methods so unintended verbs are
rejected.

Comment on lines +1138 to +1150
# Convert "10MB" → bytes
units = {"kb": 1024, "mb": 1024**2}
size_str = max_size.lower()

for suffix, factor in units.items():
if size_str.endswith(suffix):
limit = int(size_str.replace(suffix, "")) * factor
break
else:
raise Exception("Invalid max_body_size format")
MAX_LIMIT = 10 * 1024 * 1024 # 10MB
if limit > MAX_LIMIT:
raise Exception("max_body_size exceeds maximum allowed size (10MB)")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fragile max_body_size parsing.

A few correctness problems in the size-string parser:

  • size_str.replace(suffix, "") replaces all occurrences, so a value like "mb10mb" would parse to 10. Use size_str[:-len(suffix)] (suffix-stripping) instead.
  • Only kb/mb are supported; no gb/b/bare integer fallback. Any other unit raises a generic "Invalid max_body_size format".
  • int(...) rejects fractional sizes (e.g. "1.5mb") and negative values aren't guarded — a negative/zero limit would effectively block all requests.
  • Value is parsed on every request; consider caching per-canvas or precomputing during DSL validation.
🛡️ Proposed fix
-        units = {"kb": 1024, "mb": 1024**2}
-        size_str = max_size.lower()
-
-        for suffix, factor in units.items():
-            if size_str.endswith(suffix):
-                limit = int(size_str.replace(suffix, "")) * factor
-                break
-        else:
-            raise Exception("Invalid max_body_size format")
+        units = {"kb": 1024, "mb": 1024**2, "gb": 1024**3}
+        size_str = str(max_size).strip().lower()
+        for suffix, factor in units.items():
+            if size_str.endswith(suffix):
+                try:
+                    limit = int(float(size_str[: -len(suffix)].strip()) * factor)
+                except ValueError:
+                    raise Exception("Invalid max_body_size format")
+                break
+        else:
+            raise Exception("Invalid max_body_size format")
+        if limit <= 0:
+            raise Exception("max_body_size must be positive")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/agent_api.py` around lines 1138 - 1150, The size-string
parsing for max_size (variables size_str, units, limit, MAX_LIMIT) is fragile:
change suffix removal to slice (size_str[:-len(suffix)]) instead of replace,
expand units to include b, gb and accept bare integers, parse the numeric part
as float then cast to int to allow fractional sizes (e.g.,
int(float(...)*factor)), validate that limit is >0 and <= MAX_LIMIT
(10*1024*1024) and raise a clear ValueError on invalid formats or out-of-range
values, and avoid parsing on every request by moving this logic to
precompute/cache the parsed limit during DSL validation or per-canvas
initialization (store the parsed limit for reuse).

Comment on lines +1162 to +1175
client_ip = request.remote_addr


for rule in whitelist:
if "/" in rule:
# CIDR notation
if ipaddress.ip_address(client_ip) in ipaddress.ip_network(rule, strict=False):
return
else:
# Single IP
if client_ip == rule:
return

raise Exception(f"IP {client_ip} is not allowed by whitelist")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

IP whitelist ignores proxy headers and crashes on invalid input.

  • request.remote_addr will typically be the upstream proxy/load-balancer IP in a containerized deployment (behind nginx, ELB, etc.), so the DSL-configured allowlist silently locks out every real client while allowing the proxy. Honor a trusted X-Forwarded-For/X-Real-IP when the deployment is behind a trusted proxy, or document the requirement.
  • ipaddress.ip_address(client_ip) raises ValueError when client_ip is None or non-numeric (e.g. IPv6-mapped forms); the resulting exception is converted to a 400 via the outer handler but the message ("IP ... is not allowed by whitelist") becomes misleading. Validate the IP before lookup.
  • No logging of the denied IP, which makes incident investigation harder. As per coding guidelines, "Add logging for new flows".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/agent_api.py` around lines 1162 - 1175, The whitelist
check uses request.remote_addr (client_ip) which ignores proxy headers, doesn't
validate/parses the IP (ipaddress.ip_address can raise ValueError) and omits
logging; update the logic in agent_api.py to: extract the real client IP by
honoring trusted proxy headers (check X-Forwarded-For first, fall back to
X-Real-IP then request.remote_addr), validate the extracted client_ip with
ipaddress.ip_address inside a try/except to handle invalid or None values
(return a clear deny and log the parse error), perform the CIDR or single-IP
comparisons using ipaddress.ip_network/ip_address as before, and add a
structured log entry (including the denied client_ip and matched whitelist)
before raising the Exception when access is denied.

Comment on lines +1202 to +1217
key = f"rl:tb:{agent_id}"
now = time.time()

try:
res = REDIS_CONN.lua_token_bucket(
keys=[key],
args=[capacity, rate, now, cost],
client=REDIS_CONN.REDIS,
)

allowed = int(res[0])
if allowed != 1:
raise Exception("Too many requests (rate limit exceeded)")

except Exception as e:
raise Exception(f"Rate limit error: {e}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Rate-limit key is per-agent (not per-client), and the error handler masks the 429 signal.

Two concrete problems in this block:

  1. key = f"rl:tb:{agent_id}" — the token bucket is shared by all clients of this webhook. A single noisy caller can starve everyone else; the configured limit is effectively a global quota, not a per-caller quota. Typically this should include the client identity (IP / API key / bearer subject).
  2. The except Exception as e: raise Exception(f"Rate limit error: {e}") block swallows the Too many requests (rate limit exceeded) it raises a few lines above and rewraps it. The caller (line 1309) converts any failure to RetCode.BAD_REQUEST with the rewrapped message, so:
    • legitimate rate-limit hits come back as 400 "Rate limit error: Too many requests..." instead of 429;
    • Redis/Lua failures are indistinguishable from rate-limit exceeded.

Per coding guidelines ("Add logging for new flows"), also log when a caller is throttled and when the Redis backend errors, separately.

🛡️ Proposed fix
-        key = f"rl:tb:{agent_id}"
+        client_ip = request.remote_addr or "unknown"
+        key = f"rl:tb:{agent_id}:{client_ip}"
         now = time.time()

         try:
             res = REDIS_CONN.lua_token_bucket(
                 keys=[key],
                 args=[capacity, rate, now, cost],
                 client=REDIS_CONN.REDIS,
             )
-
-            allowed = int(res[0])
-            if allowed != 1:
-                raise Exception("Too many requests (rate limit exceeded)")
-
-        except Exception as e:
-            raise Exception(f"Rate limit error: {e}")
+        except Exception as e:
+            logging.exception("Rate limit backend error for agent %s", agent_id)
+            raise Exception(f"Rate limit error: {e}")
+
+        if int(res[0]) != 1:
+            logging.warning("Rate limit exceeded for agent=%s client=%s", agent_id, client_ip)
+            raise Exception("Too many requests (rate limit exceeded)")

Comment on lines +1219 to +1237
def _validate_token_auth(security_cfg):
"""Validate header-based token authentication."""
token_cfg = security_cfg.get("token",{})
header = token_cfg.get("token_header")
token_value = token_cfg.get("token_value")

provided = request.headers.get(header)
if provided != token_value:
raise Exception("Invalid token authentication")

def _validate_basic_auth(security_cfg):
"""Validate HTTP Basic Auth credentials."""
auth_cfg = security_cfg.get("basic_auth", {})
username = auth_cfg.get("username")
password = auth_cfg.get("password")

auth = request.authorization
if not auth or auth.username != username or auth.password != password:
raise Exception("Invalid Basic Auth credentials")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Python hmac.compare_digest fixed-time string comparison usage

💡 Result:

hmac.compare_digest is a fixed-time string comparison function in Python's hmac module, designed to prevent timing attacks when verifying HMAC signatures. It compares two strings or bytes-like objects (a and b) and returns True if they are equal, using constant-time comparison to avoid leaking information through execution time differences. Key details: - Usage: import hmac; hmac.compare_digest(computed_digest, received_digest) - Supports str (ASCII only, e.g., from HMAC.hexdigest) or bytes-like objects (e.g., from HMAC.digest). - Added in Python 3.3; in 3.10+, uses OpenSSL's CRYPTO_memcmp for enhanced security. - Always use it instead of == when comparing HMAC outputs to externally supplied digests, as == short-circuits on mismatch, enabling timing attacks where attackers guess byte-by-byte by measuring response times. Example for binary digest: import hmac import hashlib key = b'secret' message = b'hello' computed = hmac.new(key, message, hashlib.sha256).digest received = b'\x...matching bytes...' # from external source if hmac.compare_digest(computed, received): print('Valid') Example for hex digest: computed_hex = hmac.new(key, message, hashlib.sha256).hexdigest received_hex = 'abc123...' # hex string if hmac.compare_digest(computed_hex, received_hex): print('Valid') Note: If lengths differ or errors occur, timing may reveal length/type info (but not content). Official recommendation for all cryptographic comparisons.

Citations:


🏁 Script executed:

# Check if hmac is imported in the file
rg -n "^import hmac|^from hmac" api/apps/restful_apis/agent_api.py

# Get broader context around lines 1219-1237 to see imports and function signatures
head -50 api/apps/restful_apis/agent_api.py | tail -20

# Check for any existing imports section
rg -n "^import|^from" api/apps/restful_apis/agent_api.py | head -20

Repository: infiniflow/ragflow

Length of output: 1565


🏁 Script executed:

# Check the actual code at lines 1219-1237 and surrounding context
sed -n '1210,1245p' api/apps/restful_apis/agent_api.py

# Also check if there's any logging in nearby functions
sed -n '1200,1250p' api/apps/restful_apis/agent_api.py | grep -n "logger\|logging"

Repository: infiniflow/ragflow

Length of output: 1380


🏁 Script executed:

# Find where these validation functions are called
rg -n "_validate_token_auth|_validate_basic_auth|_validate_jwt_auth" api/apps/restful_apis/agent_api.py

# Check if there's logging for auth failures elsewhere in the file
rg -n "logger.*auth|logging.*auth" api/apps/restful_apis/agent_api.py | head -10

# Check the broader context of how these functions are used
sed -n '1150,1220p' api/apps/restful_apis/agent_api.py

Repository: infiniflow/ragflow

Length of output: 2476


Fix timing-attack vulnerability and silent authentication bypass in token validation.

  • Lines 1225 and 1235 use != for secret comparison, vulnerable to timing side-channels. Use hmac.compare_digest(...) for the token value, password, and username comparisons to prevent attackers from guessing credentials byte-by-byte through response-time measurement.
  • Lines 1221–1226: If token_cfg is empty or missing (e.g., misconfigured DSL), both header and token_value are None. Since request.headers.get(None) returns None, the condition None != None evaluates to False, silently authorizing any request with no credentials. Add a check: if not header or not token_value: raise Exception("Token auth not configured") before attempting header lookup.
  • Per coding guidelines, add logging for authentication failures (without logging secret material).
🛡️ Proposed fix
 def _validate_token_auth(security_cfg):
     """Validate header-based token authentication."""
     token_cfg = security_cfg.get("token",{})
     header = token_cfg.get("token_header")
     token_value = token_cfg.get("token_value")
+    if not header or not token_value:
+        raise Exception("Token auth not configured")

-    provided = request.headers.get(header)
-    if provided != token_value:
+    provided = request.headers.get(header, "")
+    if not hmac.compare_digest(str(provided), str(token_value)):
+        logger.warning("Invalid token authentication attempt")
         raise Exception("Invalid token authentication")

 def _validate_basic_auth(security_cfg):
     """Validate HTTP Basic Auth credentials."""
     auth_cfg = security_cfg.get("basic_auth", {})
     username = auth_cfg.get("username")
     password = auth_cfg.get("password")

     auth = request.authorization
-    if not auth or auth.username != username or auth.password != password:
+    if (
+        not auth
+        or not username
+        or not password
+        or not hmac.compare_digest(str(auth.username), str(username))
+        or not hmac.compare_digest(str(auth.password), str(password))
+    ):
+        logger.warning("Invalid basic auth attempt")
         raise Exception("Invalid Basic Auth credentials")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/agent_api.py` around lines 1219 - 1237, In
_validate_token_auth: ensure token config exists by checking
token_cfg.get("token_header") and token_cfg.get("token_value") and raise a clear
Exception like "Token auth not configured" if missing, then fetch provided
header and compare using hmac.compare_digest(provided, token_value) instead of
!=; log an authentication failure (without secrets) and raise on mismatch. In
_validate_basic_auth: use hmac.compare_digest for both auth.username vs username
and auth.password vs password, log failures (no secret data) and raise on
mismatch; ensure you import hmac and use these functions in _validate_token_auth
and _validate_basic_auth to mitigate timing attacks and add the configuration
presence check to prevent silent bypass.

Comment on lines +1310 to +1317
if not isinstance(cvs.dsl, str):
dsl = json.dumps(cvs.dsl, ensure_ascii=False)
try:
canvas = Canvas(dsl, cvs.user_id, agent_id, canvas_id=agent_id)
except Exception as e:
resp=get_data_error_result(code=RetCode.BAD_REQUEST,message=str(e))
resp.status_code = RetCode.BAD_REQUEST
return resp
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Inspect Canvas.__init__ to confirm the expected semantics of each positional arg.
ast-grep --pattern $'class Canvas:
  $$$
  def __init__($$$):
    $$$
'
rg -nP --type=py -C1 '\bCanvas\s*\(' -g '!**/test/**' | head -60

Repository: infiniflow/ragflow

Length of output: 2889


🏁 Script executed:

# Get Canvas.__init__ signature
rg -nA15 'class Canvas' agent/canvas.py | head -40

Repository: infiniflow/ragflow

Length of output: 707


🏁 Script executed:

# Get full context around lines 1310-1317 in agent_api.py
sed -n '1280,1330p' api/apps/restful_apis/agent_api.py | cat -n

Repository: infiniflow/ragflow

Length of output: 2289


🏁 Script executed:

sed -n '1200,1315p' api/apps/restful_apis/agent_api.py | head -120

Repository: infiniflow/ragflow

Length of output: 4104


🏁 Script executed:

sed -n '1050,1210p' api/apps/restful_apis/agent_api.py | tail -80

Repository: infiniflow/ragflow

Length of output: 2504


🏁 Script executed:

sed -n '900,1050p' api/apps/restful_apis/agent_api.py | grep -n "^@\|^async def\|^def" | tail -10

Repository: infiniflow/ragflow

Length of output: 44


🏁 Script executed:

rg -nB50 'canvas = Canvas\(dsl, cvs.user_id' api/apps/restful_apis/agent_api.py | head -80

Repository: infiniflow/ragflow

Length of output: 2114


🏁 Script executed:

sed -n '900,1050p' api/apps/restful_apis/agent_api.py | head -100

Repository: infiniflow/ragflow

Length of output: 3976


🏁 Script executed:

sed -n '1100,1150p' api/apps/restful_apis/agent_api.py | head -60

Repository: infiniflow/ragflow

Length of output: 1694


🏁 Script executed:

rg -nB100 'canvas = Canvas\(dsl, cvs.user_id' api/apps/restful_apis/agent_api.py | grep -E "^[0-9]+-(@|async def|def )" | tail -5

Repository: infiniflow/ragflow

Length of output: 44


🏁 Script executed:

# Get the function definition by searching backwards from line 1313
sed -n '1,1313p' api/apps/restful_apis/agent_api.py | tac | grep -m1 "^@\|^async def\|^def " | head -1

Repository: infiniflow/ragflow

Length of output: 96


🏁 Script executed:

# Better approach - find the function containing line 1313
awk 'NR<=1313 && /^(@|async def|def )/ {func=$0; line=NR} END {print line": "func}' api/apps/restful_apis/agent_api.py

Repository: infiniflow/ragflow

Length of output: 226


🏁 Script executed:

rg -nA20 'async def webhook\(agent_id: str\)' api/apps/restful_apis/agent_api.py | head -30

Repository: infiniflow/ragflow

Length of output: 1075


🏁 Script executed:

sed -n '1061,1320p' api/apps/restful_apis/agent_api.py | grep -E "tenant_id|cvs\." | head -20

Repository: infiniflow/ragflow

Length of output: 280


🏁 Script executed:

# Check if cvs (UserCanvas) object has tenant_id or similar field
rg -n "class UserCanvas" api/db/models/ -A20 | head -40

Repository: infiniflow/ragflow

Length of output: 117


🏁 Script executed:

# Also check the UserCanvasService to see what fields are available
rg -n "class UserCanvasService" api/db/services/ -A30 | head -40

Repository: infiniflow/ragflow

Length of output: 2330


🏁 Script executed:

# Find the UserCanvas model definition
fd -e py "\.py$" api/db/ | xargs rg -l "class UserCanvas" | head -5

Repository: infiniflow/ragflow

Length of output: 155


🏁 Script executed:

# Look for model definitions
rg -n "class UserCanvas" api/ -A15 | head -40

Repository: infiniflow/ragflow

Length of output: 3164


Pass tenant_id instead of cvs.user_id to Canvas().

The Canvas constructor expects tenant_id as its second positional argument. All other call sites in the codebase pass tenant_id (e.g., lines 187, 446, 466, 683, 981). Here you pass cvs.user_id, which is the canvas owner, not the tenant executing the webhook. This causes model calls, billing, and knowledge-base lookups to be attributed to the wrong tenant.

The webhook() function signature does not provide a tenant_id parameter. You must identify the tenant from the request context (e.g., from authentication, headers, or the webhook configuration) before instantiating Canvas.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/agent_api.py` around lines 1310 - 1317, The Canvas
constructor is being given cvs.user_id (canvas owner) instead of the tenant_id
expected by Canvas, causing billing/KB attribution errors; in webhook() locate
the tenant_id from the request context (for example from authentication data,
request.user.tenant_id, a specific header, or the webhook configuration
available in the function) and pass that tenant_id as the second argument to
Canvas instead of cvs.user_id (keep canvas_id=agent_id as before); update any
variable names accordingly so Canvas(dsl, tenant_id, agent_id,
canvas_id=agent_id) is used and ensure proper error handling if tenant_id cannot
be resolved.

Comment on lines +1330 to +1334
ctype = request.headers.get("Content-Type", "").split(";")[0].strip()
if ctype and ctype != content_type:
raise ValueError(
f"Invalid Content-Type: expect '{content_type}', got '{ctype}'"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

content_types vs content_type: list-vs-string mismatch.

webhook_cfg.get("content_types") (plural — matching what DSL stores) is passed straight into parse_webhook_request(content_type) and compared with ctype != content_type at line 1331. If the DSL holds a list of allowed content types (as the plural field name strongly implies), this comparison always fails and every real request is rejected with Invalid Content-Type: expect '[...]'. Normalize to a list and test membership.

🛡️ Proposed fix
-    async def parse_webhook_request(content_type):
+    async def parse_webhook_request(content_types):
         """Parse request based on content-type and return structured data."""

         # 1. Query
         query_data = {k: v for k, v in request.args.items()}

         # 2. Headers
         header_data = {k: v for k, v in request.headers.items()}

         # 3. Body
         ctype = request.headers.get("Content-Type", "").split(";")[0].strip()
-        if ctype and ctype != content_type:
-            raise ValueError(
-                f"Invalid Content-Type: expect '{content_type}', got '{ctype}'"
-            )
+        allowed = content_types or []
+        if isinstance(allowed, str):
+            allowed = [allowed]
+        if allowed and ctype and ctype not in allowed:
+            raise ValueError(
+                f"Invalid Content-Type: expect one of {allowed}, got '{ctype}'"
+            )

Also applies to: 1544-1544

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/agent_api.py` around lines 1330 - 1334, The code in
parse_webhook_request is comparing the request Content-Type string (ctype)
against a variable named content_type which may actually be a list from
webhook_cfg.get("content_types"), causing every request to fail; change the
logic inside parse_webhook_request (and the other occurrence around the second
check) to normalize the incoming expected types into an iterable (e.g., if
content_type is a str wrap it in [content_type], or if None use []), then check
membership (if content_types and ctype not in content_types: raise
ValueError(...)) and update the error message to show the allowed list (e.g.,
"expect one of [...] got '...'" ), referencing parse_webhook_request, the local
variable ctype, and the webhook_cfg.get("content_types") usage to locate the
fixes.

Comment on lines +1662 to +1663
asyncio.create_task(background_run())
return resp
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

asyncio create_task strong reference weak reference garbage collection

💡 Result:

In Python's asyncio, when you call asyncio.create_task(coro), it returns an asyncio.Task object that schedules the coroutine for execution. The event loop maintains only weak references to tasks (using WeakSet internally before Python 3.14, and per-thread linked lists in 3.14+ that still don't hold strong refs). Without a strong reference elsewhere, the Task can be garbage collected at any time, even mid-execution if not actively running or awaiting. This design prevents memory leaks from "fire-and-forget" tasks that complete and hold results/exceptions indefinitely. Temporary strong refs exist while the task is scheduled (_ready queue) or sleeping (timers), but these are released when the task yields without external refs, allowing GC during cycles. To prevent premature GC: keep strong refs, e.g., in a set with done callbacks for cleanup: background_tasks = set task = asyncio.create_task(some_coro) background_tasks.add(task) task.add_done_callback(background_tasks.discard) Official docs emphasize: "Save a reference to the result of this function, to avoid a task disappearing mid-execution." This behavior is unchanged as of Python 3.14 (today's date 2026-04-24, post-3.14 release).

Citations:


🏁 Script executed:

# First, let me check the file and the specific lines mentioned
cat -n api/apps/restful_apis/agent_api.py | sed -n '1655,1675p'

Repository: infiniflow/ragflow

Length of output: 946


🏁 Script executed:

# Search for background_run function definition
rg -n "def background_run" api/apps/restful_apis/agent_api.py

Repository: infiniflow/ragflow

Length of output: 103


🏁 Script executed:

# Search for any existing task tracking mechanisms (module-level sets)
rg -n "_.*tasks.*set\|_background_tasks\|_webhook.*tasks" api/apps/restful_apis/agent_api.py

Repository: infiniflow/ragflow

Length of output: 44


🏁 Script executed:

# Check for Python version constraints in the project
cat pyproject.toml 2>/dev/null | grep -A 5 "python"

Repository: infiniflow/ragflow

Length of output: 2777


🏁 Script executed:

# Check if there's a ruff configuration
cat -n ruff.toml 2>/dev/null || cat -n pyproject.toml 2>/dev/null | grep -A 10 "\[tool.ruff\]"

Repository: infiniflow/ragflow

Length of output: 337


🏁 Script executed:

# Get the function context - look at more lines around 1662
cat -n api/apps/restful_apis/agent_api.py | sed -n '1610,1670p'

Repository: infiniflow/ragflow

Length of output: 2660


🏁 Script executed:

# Check what function/method contains line 1662
cat -n api/apps/restful_apis/agent_api.py | sed -n '1500,1665p' | grep -E "^\s+[0-9]+\s+(def |async def )" | tail -5

Repository: infiniflow/ragflow

Length of output: 309


🏁 Script executed:

# Search for ruff rule asyncio-dangling-task to confirm it matches ASYNC/ASYNC1
web_search
ruff asyncio-dangling-task ASYNC rule

Repository: infiniflow/ragflow

Length of output: 236


🏁 Script executed:

# Find the parent function definition - look higher up
cat -n api/apps/restful_apis/agent_api.py | sed -n '1450,1520p' | grep -E "^[[:space:]]*[0-9]+\s+(def |async def |@)"

Repository: infiniflow/ragflow

Length of output: 102


🏁 Script executed:

# Check if there's any existing task management in the entire file
rg -n "add_done_callback|_tasks|task_group\|gather\|shield" api/apps/restful_apis/agent_api.py | head -20

Repository: infiniflow/ragflow

Length of output: 44


🌐 Web query:

ruff asyncio-dangling-task ASYNC101 rule documentation

💡 Result:

The query refers to the Ruff linter rule for detecting dangling asyncio tasks, but the rule code is RUF006 (asyncio-dangling-task), not ASYNC101. ASYNC101 was a rule from the deprecated flake8-async linter for blocking calls in async functions (e.g., open, time.sleep), which Ruff has reimplemented as separate rules like ASYNC230, but not directly as ASYNC101. Ruff's asyncio-dangling-task (RUF006), added in v0.0.247, checks for asyncio.create_task and asyncio.ensure_future calls where the returned Task object is not stored in a variable, collection, or otherwise strongly referenced. The event loop holds only a weak reference, so unreferenced tasks can be garbage collected prematurely, leading to inconsistent execution. Example violation: import asyncio asyncio.create_task(some_coro) # Dangling task Fixed: background_tasks = set task = asyncio.create_task(some_coro) background_tasks.add(task) task.add_done_callback(background_tasks.discard) Configuration: Enable with [ruff] select = ["RUF"] in pyproject.toml. It also detects unused assignments to tasks.

Citations:


🏁 Script executed:

# Check the full ruff configuration
cat -n pyproject.toml | sed -n '204,220p'

Repository: infiniflow/ragflow

Length of output: 489


🏁 Script executed:

# Look for any RUF selection in the config
rg -n "RUF|ruf" pyproject.toml

Repository: infiniflow/ragflow

Length of output: 99


Retain strong reference to long-running background task.

The background_run() coroutine performs extended operations (async loop over canvas.run(), database updates). Without holding a strong reference to the task, Python's event loop (which uses only weak references) can garbage-collect it mid-execution, silently terminating the canvas run. This violates asyncio best practices documented in the official Python reference.

Implement the recommended pattern:

Fix
+_webhook_background_tasks: set[asyncio.Task] = set()
+
         async def background_run():
             # ... implementation ...

         asyncio.create_task(background_run())
+        task = asyncio.create_task(background_run())
+        _webhook_background_tasks.add(task)
+        task.add_done_callback(_webhook_background_tasks.discard)
         return resp
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/agent_api.py` around lines 1662 - 1663, The
background_run() task is created without a strong reference and can be
garbage-collected; change the creation site where
asyncio.create_task(background_run()) is called so you keep a strong reference
(e.g., create a module-level set like BACKGROUND_TASKS or attach a set to the
app), assign the task to a variable (task =
asyncio.create_task(background_run())), add it to that set, and register a done
callback to remove it when finished (task.add_done_callback(lambda t:
BACKGROUND_TASKS.discard(t))) so the long-running background_run coroutine is
retained for its lifetime.

Comment on lines +1683 to +1706
if ans["event"] == "message_end":
status = int(ans["data"].get("status", status))
if is_test:
append_webhook_trace(
agent_id,
start_ts,
ans
)
if is_test:
append_webhook_trace(
agent_id,
start_ts,
{
"event": "finished",
"elapsed_time": time.time() - start_ts,
"success": True,
}
)
final_content = "".join(contents)
return {
"message": final_content,
"success": True,
"code": status,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard the HTTP status derived from canvas output.

  • status = int(ans["data"].get("status", status)) (line 1684) raises ValueError if a component emits a non-numeric status, which is caught by the surrounding except Exception and turned into a 400 "invalid literal for int()" response — unhelpful for operators. Validate/clamp to a sensible HTTP range.
  • Response(..., status=result["code"], ...) (line 1733) uses whatever integer the canvas produced, so a message_end payload can set status=999 or 0. Constrain to known HTTP codes.
  • Per coding guidelines, "Add logging for new flows": also log the synchronous failure path (line 1708-1728) with the agent id and elapsed time.
🛡️ Minimal hardening
-                    if ans["event"] == "message_end":
-                        status = int(ans["data"].get("status", status))
+                    if ans["event"] == "message_end":
+                        try:
+                            candidate = int(ans["data"].get("status", status))
+                        except (TypeError, ValueError):
+                            candidate = status
+                        if 200 <= candidate <= 599:
+                            status = candidate

Also applies to: 1730-1735

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/agent_api.py` around lines 1683 - 1706, The
canvas-derived status parsing should be hardened: when extracting status from
ans["data"] in the message loop (the status variable set via status =
int(ans["data"].get("status", status))), catch conversion errors and
validate/clamp the value into the HTTP range (e.g., ensure an int between 100
and 599, defaulting to a safe code like 500 if invalid); replace direct use of
the raw status in Response(..., status=result["code"], ...) with the
clamped/validated status (use the same clamp routine where result["code"] is
computed). Also add logging on the synchronous failure path (the except/failure
block previously returning the 400 payload) to include agent_id and elapsed time
(and any error details) using the existing logger, and ensure
append_webhook_trace calls (append_webhook_trace, start_ts) are preserved. Use
function/variable names ans, status, append_webhook_trace, final_content,
agent_id, start_ts, and result["code"] to locate and update the spots.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/apps/restful_apis/agent_api.py`:
- Around line 1816-1826: When decode_webhook_id(webhook_id, webhooks) returns
None, do not return finished=True; instead log the unresolved webhook_id (and
any relevant context like since_ts) and return a "not found" response so the
client can surface an expired/unknown trace. Concretely, in the handler
surrounding decode_webhook_id, add a processLogger (or appropriate logger) entry
recording webhook_id and webhooks, then call get_json_result with an error code
(e.g., RetCode.NOT_FOUND) or include a dedicated flag such as {"webhook_id":
webhook_id, "events": [], "next_since_ts": since_ts, "finished": False,
"not_found": True} so the UI can distinguish expired/unknown from completed
runs. Ensure you reference decode_webhook_id, get_json_result, and
RetCode.NOT_FOUND (or the chosen flag) when making the change.
- Line 1059: The is_test boolean is fragile because it uses the raw path; change
detection to use the matched route rule or endpoint instead: replace the
startswith check that assigns is_test (in the handler where request is
available) with a check against request.url_rule.rule.endswith("/webhook/test")
or request.endpoint.endswith("webhook_test") depending on your route naming, so
the flag is derived from the matched route (request.url_rule or
request.endpoint) rather than request.path.
- Around line 1803-1805: The code currently converts Redis string keys to floats
then back to strings (start_ts = min(candidates); real_id = str(start_ts)) which
can mismatch the original Redis key; instead preserve and use the original
string key from webhooks.keys(): when building candidates, keep tuples of
(float_value, original_key) or a mapping float->original_key, pick the tuple
with the minimum float_value and set real_id to the original_key (then call
encode_webhook_id(real_id)); update references around start_ts, real_id, and
encode_webhook_id to use the preserved original string key rather than
str(float).
- Around line 1631-1632: The code sets cvs.dsl then calls
UserCanvasService.update_by_id with the owner's id (cvs.user_id), which updates
the wrong record; change the call to use the canvas/agent primary key (e.g.,
cvs.id) instead: after cvs.dsl = json.loads(str(canvas)) call
UserCanvasService.update_by_id(cvs.id, cvs.to_dict()) so the updated DSL is
persisted against the correct canvas/agent record (ensure the primary key used
matches the model's canvas id field).
- Around line 1739-1742: Replace the strict owner-only check that compares
cvs.user_id to current_user.id with the standard access gate used elsewhere:
call UserCanvasService.accessible(agent_id, tenant_id) using cvs.agent_id as the
agent_id and the current user's tenant identifier (e.g. current_user.tenant_id)
as the tenant_id; if that call returns False, return
get_data_error_result(message="Canvas not found.") so teammates and
joined-tenant users are authorized the same way as other handlers like
list_agent_sessions/get_agent/get_agent_logs.
- Around line 1487-1540: Guard against None before calling .startswith on schema
type variables: in the conversion logic where expected_type is used (the block
starting with expected_type.startswith("array")), change the check to first
ensure expected_type is truthy (e.g., if expected_type and
expected_type.startswith("array")) so it won’t AttributeError when type is
missing; similarly, in validate_type (function validate_type) replace
t.startswith("array") with a guarded check (e.g., if t and
t.startswith("array")) and keep the rest of the array-handling logic unchanged.
Ensure both places reference the same pattern used in default_for_type to avoid
NPEs while preserving current behavior for unknown types.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c310d2ac-dd44-4040-8fa9-223fa682ca82

📥 Commits

Reviewing files that changed from the base of the PR and between 4e203a4 and 203f537.

📒 Files selected for processing (1)
  • api/apps/restful_apis/agent_api.py

@manager.route("/agents/<agent_id>/webhook", methods=["POST", "GET", "PUT", "PATCH", "DELETE", "HEAD"]) # noqa: F821
@manager.route("/agents/<agent_id>/webhook/test",methods=["POST", "GET", "PUT", "PATCH", "DELETE", "HEAD"],) # noqa: F821
async def webhook(agent_id: str):
is_test = request.path.startswith(f"/api/v1/agents/{agent_id}/webhook/test")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fragile is_test detection via hardcoded URL prefix.

request.path.startswith(f"/api/v1/agents/{agent_id}/webhook/test") couples the handler to a specific URL prefix. If the app is mounted under a different prefix (reverse-proxy rewrite, test client, blueprint prefix change), is_test silently becomes False and trace logging is disabled without any error. Prefer deriving this from the matched endpoint/rule rather than the raw path.

🛡️ Suggested fix
-    is_test = request.path.startswith(f"/api/v1/agents/{agent_id}/webhook/test")
+    # Endpoint matching is independent of URL prefix / reverse-proxy rewrites.
+    is_test = request.path.rstrip("/").endswith("/webhook/test")

Or, if Quart exposes the matched rule, switch to request.url_rule.rule.endswith("/webhook/test") for a more robust match.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/agent_api.py` at line 1059, The is_test boolean is
fragile because it uses the raw path; change detection to use the matched route
rule or endpoint instead: replace the startswith check that assigns is_test (in
the handler where request is available) with a check against
request.url_rule.rule.endswith("/webhook/test") or
request.endpoint.endswith("webhook_test") depending on your route naming, so the
flag is derived from the matched route (request.url_rule or request.endpoint)
rather than request.path.

Comment on lines +1487 to +1540
if expected_type.startswith("array"):
try:
parsed = json.loads(v)
if isinstance(parsed, list):
return parsed
else:
raise Exception("JSON is not an array")
except Exception:
raise Exception(f"Cannot convert '{value}' to array")

# String (accept original)
if expected_type == "string":
return value

# File
if expected_type == "file":
return value
# Default: do nothing
return value


def validate_type(value, t):
"""Validate value type against schema type t."""
if t == "file":
return isinstance(value, list)

if t == "string":
return isinstance(value, str)

if t == "number":
return isinstance(value, (int, float))

if t == "boolean":
return isinstance(value, bool)

if t == "object":
return isinstance(value, dict)

# array<string> / array<number> / array<object>
if t.startswith("array"):
if not isinstance(value, list):
return False

if "<" in t and ">" in t:
inner = t[t.find("<") + 1 : t.find(">")]

# Check each element type
for item in value:
if not validate_type(item, inner):
return False

return True

return True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm no existing tests depend on validate_type returning True for unknown types.
rg -nP '\bvalidate_type\s*\(' -C3

Repository: infiniflow/ragflow

Length of output: 1602


🏁 Script executed:

# Read the function definitions and surrounding context
sed -n '1430,1550p' api/apps/restful_apis/agent_api.py | cat -n

Repository: infiniflow/ragflow

Length of output: 4685


🏁 Script executed:

# Check logging imports and usage in the file
head -50 api/apps/restful_apis/agent_api.py | grep -i "import"

Repository: infiniflow/ragflow

Length of output: 1068


🏁 Script executed:

# Search for test files related to webhook or validate_type
find . -type f -name "*test*.py" -o -name "*_test.py" | xargs grep -l "validate_type\|webhook" 2>/dev/null | head -20

Repository: infiniflow/ragflow

Length of output: 135


🏁 Script executed:

# Check Python version requirement in project
find . -type f \( -name "setup.py" -o -name "pyproject.toml" -o -name "setup.cfg" -o -name ".python-version" \) -exec cat {} \; 2>/dev/null | head -30

Repository: infiniflow/ragflow

Length of output: 879


🏁 Script executed:

cat test/testcases/test_web_api/test_agent_app/test_agents_webhook_unit.py

Repository: infiniflow/ragflow

Length of output: 49601


🏁 Script executed:

# Check main project Python requirement
cat pyproject.toml | head -100

Repository: infiniflow/ragflow

Length of output: 3648


🏁 Script executed:

# Check current logging usage patterns in agent_api.py
rg "logging\." api/apps/restful_apis/agent_api.py -A2 -B2 | head -50

Repository: infiniflow/ragflow

Length of output: 1183


🏁 Script executed:

# Verify the exact test expectation for unknown type handling
sed -n '940,980p' test/testcases/test_web_api/test_agent_app/test_agents_webhook_unit.py

Repository: infiniflow/ragflow

Length of output: 1593


🏁 Script executed:

# Search for any other places where validate_type might be called with None
rg -n "validate_type\(" api/apps/restful_apis/agent_api.py -B3 -A3

Repository: infiniflow/ragflow

Length of output: 867


🏁 Script executed:

# Find the exact lines in test that show unknown type is expected to pass
grep -n "mystery" test/testcases/test_web_api/test_agent_app/test_agents_webhook_unit.py -B3 -A3

Repository: infiniflow/ragflow

Length of output: 560


Guard expected_type and t against None before calling .startswith().

Line 1487 (expected_type.startswith("array")) and line 1526 (t.startswith("array")) will raise AttributeError if the schema omits the type property or sets it to None. The default_for_type function already guards with if t and … (line 1430), so these sites should apply the same pattern.

Suggested fix for NPE
    def auto_cast_value(value, expected_type):
         """Convert string values into schema type when possible."""
 
         # Non-string values already good
         if not isinstance(value, str):
             return value
 
         v = value.strip()
+
+        if not expected_type:
+            return value
 
         # Boolean
         if expected_type == "boolean":
     def validate_type(value, t):
         """Validate value type against schema type t."""
+        if not t:
+            return True
         if t == "file":

Regarding unknown schema types: The test suite (test_webhook_schema_extract_cast_defaults_and_validation_errors) expects unknown types like "mystery" to be silently accepted. Changing validate_type to return False for unrecognized types would be a breaking change. If stricter validation is desired, update both the implementation and its tests together.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/agent_api.py` around lines 1487 - 1540, Guard against
None before calling .startswith on schema type variables: in the conversion
logic where expected_type is used (the block starting with
expected_type.startswith("array")), change the check to first ensure
expected_type is truthy (e.g., if expected_type and
expected_type.startswith("array")) so it won’t AttributeError when type is
missing; similarly, in validate_type (function validate_type) replace
t.startswith("array") with a guarded check (e.g., if t and
t.startswith("array")) and keep the rest of the array-handling logic unchanged.
Ensure both places reference the same pattern used in default_for_type to avoid
NPEs while preserving current behavior for unknown types.

Comment on lines +1631 to +1632
cvs.dsl = json.loads(str(canvas))
UserCanvasService.update_by_id(cvs.user_id, cvs.to_dict())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: canvas still persisted against cvs.user_id — past "Addressed" comment did not land.

UserCanvasService.update_by_id(cvs.user_id, cvs.to_dict()) uses the owner's user id as the primary key for update_by_id, not the canvas/agent id. A successful background run either updates the wrong record (if some canvas happens to share that id) or silently no-ops, and the updated DSL is never saved against this agent. The previous review comment is marked as addressed in commits 75a5548→4e203a4, but the current code still has cvs.user_id here — the fix appears to have been reverted or lost during rebase.

🐛 Proposed fix
-                cvs.dsl = json.loads(str(canvas))
-                UserCanvasService.update_by_id(cvs.user_id, cvs.to_dict())
+                cvs.dsl = json.loads(str(canvas))
+                UserCanvasService.update_by_id(agent_id, cvs.to_dict())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/agent_api.py` around lines 1631 - 1632, The code sets
cvs.dsl then calls UserCanvasService.update_by_id with the owner's id
(cvs.user_id), which updates the wrong record; change the call to use the
canvas/agent primary key (e.g., cvs.id) instead: after cvs.dsl =
json.loads(str(canvas)) call UserCanvasService.update_by_id(cvs.id,
cvs.to_dict()) so the updated DSL is persisted against the correct canvas/agent
record (ensure the primary key used matches the model's canvas id field).

Comment on lines +1739 to +1742
if not exists or str(cvs.user_id) != str(current_user.id):
return get_data_error_result(
message="Canvas not found.",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Access check is stricter than every other endpoint in this module.

All other handlers in this file gate access with UserCanvasService.accessible(agent_id, tenant_id), which covers joined-tenant / team-member access (see list_agent_sessions, get_agent, get_agent_logs, etc.). Here the check is str(cvs.user_id) != str(current_user.id), which only admits the original creator — any teammate who can view traces for every other endpoint gets a misleading "Canvas not found." here. Also, current_user.id is a user id while cvs.user_id is the canvas tenant_id on this model, so even the creator may fail this check under tenant switching.

🛡️ Suggested fix
-    exists, cvs = UserCanvasService.get_by_id(agent_id)
-    if not exists or str(cvs.user_id) != str(current_user.id):
-        return get_data_error_result(
-            message="Canvas not found.",
-        )
+    exists, cvs = UserCanvasService.get_by_id(agent_id)
+    if not exists:
+        return get_data_error_result(message="Canvas not found.")
+    if not UserCanvasService.accessible(agent_id, current_user.id):
+        return get_json_result(
+            data=False,
+            message="Only owner of canvas authorized for this operation.",
+            code=RetCode.OPERATING_ERROR,
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/agent_api.py` around lines 1739 - 1742, Replace the
strict owner-only check that compares cvs.user_id to current_user.id with the
standard access gate used elsewhere: call UserCanvasService.accessible(agent_id,
tenant_id) using cvs.agent_id as the agent_id and the current user's tenant
identifier (e.g. current_user.tenant_id) as the tenant_id; if that call returns
False, return get_data_error_result(message="Canvas not found.") so teammates
and joined-tenant users are authorized the same way as other handlers like
list_agent_sessions/get_agent/get_agent_logs.

Comment on lines +1803 to +1805
start_ts = min(candidates)
real_id = str(start_ts)
webhook_id = encode_webhook_id(real_id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

str(float) round-trip risk when matching back against Redis keys.

Keys are written with str(start_ts) at line 1569, where start_ts = time.time() (a float). Here the candidate is recovered via float(k) (line 1790) → min(...)str(...) (line 1804). Python's str(float) uses the shortest repr that round-trips, so str(float(k)) == k is usually true, but not guaranteed — trailing-zero representations ("1700000000.0" vs "1700000000"), or values that normalize differently across Python/Redis encodings, will desync, and decode_webhook_id will then fail to find a key that exists. Safer to keep the original string key from webhooks.keys() rather than re-stringifying the parsed float.

🛡️ Suggested fix
-        candidates = [
-            float(k) for k in webhooks.keys() if float(k) > since_ts
-        ]
+        candidate_keys = [k for k in webhooks.keys() if float(k) > since_ts]
 
-        if not candidates:
+        if not candidate_keys:
             return get_json_result(
                 data={
                     "webhook_id": None,
                     "events": [],
                     "next_since_ts": since_ts,
                     "finished": False,
                 }
             )
 
-        start_ts = min(candidates)
-        real_id = str(start_ts)
-        webhook_id = encode_webhook_id(real_id)
+        real_id = min(candidate_keys, key=float)
+        start_ts = float(real_id)
+        webhook_id = encode_webhook_id(real_id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/agent_api.py` around lines 1803 - 1805, The code
currently converts Redis string keys to floats then back to strings (start_ts =
min(candidates); real_id = str(start_ts)) which can mismatch the original Redis
key; instead preserve and use the original string key from webhooks.keys(): when
building candidates, keep tuples of (float_value, original_key) or a mapping
float->original_key, pick the tuple with the minimum float_value and set real_id
to the original_key (then call encode_webhook_id(real_id)); update references
around start_ts, real_id, and encode_webhook_id to use the preserved original
string key rather than str(float).

Comment on lines +1816 to +1826
real_id = decode_webhook_id(webhook_id, webhooks)

if not real_id:
return get_json_result(
data={
"webhook_id": webhook_id,
"events": [],
"next_since_ts": since_ts,
"finished": True,
}
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

finished=True for an unknown webhook_id conflates "expired/unknown" with "run completed".

When decode_webhook_id returns None (no matching start_ts in Redis — trace expired, wrong agent, wrong id), the handler currently returns finished: True. A polling client cannot distinguish this from a successful completion and will stop displaying progress with no error surfaced. Consider returning an explicit error code (e.g. RetCode.NOT_FOUND) or a dedicated flag so the UI can show "trace expired" rather than silently marking the run as done.

Also, per coding guidelines, log the unresolved webhook_id to aid post-hoc debugging.

As per coding guidelines, "Add logging for new flows".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/agent_api.py` around lines 1816 - 1826, When
decode_webhook_id(webhook_id, webhooks) returns None, do not return
finished=True; instead log the unresolved webhook_id (and any relevant context
like since_ts) and return a "not found" response so the client can surface an
expired/unknown trace. Concretely, in the handler surrounding decode_webhook_id,
add a processLogger (or appropriate logger) entry recording webhook_id and
webhooks, then call get_json_result with an error code (e.g., RetCode.NOT_FOUND)
or include a dedicated flag such as {"webhook_id": webhook_id, "events": [],
"next_since_ts": since_ts, "finished": False, "not_found": True} so the UI can
distinguish expired/unknown from completed runs. Ensure you reference
decode_webhook_id, get_json_result, and RetCode.NOT_FOUND (or the chosen flag)
when making the change.

@wangq8 wangq8 self-requested a review April 24, 2026 09:55
@wangq8 wangq8 merged commit 9ad752f into infiniflow:main Apr 24, 2026
2 checks passed
@wangq8 wangq8 changed the title Refa:migrate agent webhook routes to REST APIs Refa : migrate agent webhook routes to REST APIs Apr 24, 2026
xugangqiang pushed a commit to xugangqiang/ragflow that referenced this pull request Apr 24, 2026
### What problem does this PR solve?

migrate agent webhook routes to REST APIs

### Type of change
- [x] Refactoring
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐖api The modified files are located under directory 'api/apps/sdk' ci Continue Integration 🌈 python Pull requests that update Python code ☯️ refactor Pull request that refactor/refine code size:XXL This PR changes 1000+ lines, ignoring generated files. 🧰 typescript Pull requests that update Typescript code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants