Refa : migrate agent webhook routes to REST APIs#14330
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAgent-scoped webhook endpoints moved into Changes
Sequence Diagram(s)sequenceDiagram
participant Client as HTTP Client
participant API as Agent Webhook API
participant Auth as Auth Validator
participant Redis as Redis (rate-limit/logs)
participant Canvas as Canvas Engine
Client->>API: POST /api/v1/agents/{id}/webhook\n(payload & headers)
API->>API: Load agent canvas & find begin component
API->>API: Validate HTTP method & body size
API->>Auth: Authenticate (none/token/basic/JWT)
Auth-->>API: Auth result
API->>Redis: Check token-bucket rate limit
Redis-->>API: Rate-limit verdict
API->>API: Validate IP allowlist
API->>API: Parse request (query, headers, body, files)
API->>Canvas: canvas.run(webhook_payload)
Canvas-->>API: Execution events / result
alt Immediate Mode
API->>Redis: Log execution events
API-->>Client: Return configured status/body (async)
else Streaming Mode
API-->>Client: Aggregate & return JSON result (sync)
end
Client->>API: GET /api/v1/agents/{id}/webhook/logs?since_ts=...&webhook_id=...
API->>Redis: Retrieve incremental trace events
Redis-->>API: Events
API-->>Client: Incremental trace payload
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (1)
api/apps/restful_apis/agent_api.py (1)
40-42: Add lightweight logs for the new webhook REST flow.Only background exceptions are logged. Add structured info/warning logs for route entry, validation rejections, auth/ownership denial, and trace polling failures without logging payloads, tokens, or secrets. As per coding guidelines,
**/*.py: Add logging for new flows.Also applies to: 717-720
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/apps/restful_apis/agent_api.py` around lines 40 - 42, Add structured, non-sensitive logging to the webhook REST flow in the webhook(agent_id: str) handler: emit an info log on route entry (include agent_id and a short route label, using request.path but never the payload), warn/info when validation rejects input (reference where validations occur), warn when auth/ownership is denied (reference the auth/ownership check code paths), and warn/error when trace polling or poll result handling fails (reference the trace polling logic). Use structured fields for agent_id/trace_id/status but do NOT log request bodies, tokens, headers, or secrets; keep logs at appropriate levels (info for entry, warning for validation/auth failures, error for polling exceptions). Ensure similar logging is added to the corresponding trace-polling code paths noted near lines 717-720.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/apps/restful_apis/agent_api.py`:
- Around line 664-707: The synchronous path currently calls append_webhook_trace
(with agent_id, start_ts, ans and later with finished/error events) directly
which can raise Redis/JSON errors and override a valid response; update the
synchronous branch in the try and except blocks to mirror the background path by
wrapping each append_webhook_trace call in a try/except that swallows/logs
trace-specific errors so they don't propagate (i.e., around the
append_webhook_trace calls in the successful path and in the exception handler
use a guarded call that catches exceptions and does not re-raise).
- Around line 727-733: The code currently hardcodes the HMAC key as
WEBHOOK_ID_SECRET and derives webhook IDs with hmac.new(...); replace the
literal by reading the secret from configuration/secret storage (e.g.,
application config or environment variable) inside the function that generates
the signature (the block using WEBHOOK_ID_SECRET and hmac.new), fail fast with a
clear error if the secret is missing, and use that configured value in tests to
compute the expected ID rather than asserting against the literal; keep the same
base64.urlsafe_b64encode(...).decode(...).rstrip("=") output behavior so
existing callers are unaffected.
- Around line 299-359: The parse_webhook_request async function currently
catches all exceptions and replaces any parse error (including JSON decode
errors, "Too many uploaded files", form/file read failures, or content-type
mismatches) with an empty dict, which hides failures; change it so parsing
errors are propagated (or returned as a structured error) instead of being
swallowed: remove or narrow the broad except Exception block in
parse_webhook_request and let real exceptions from request.get_json(),
request.form, request.files, FileService.upload_info, canvas.get_files_async,
and json.loads bubble up (or re-raise them with context), and ensure callers of
parse_webhook_request catch these errors and convert content-type mismatches or
parse exceptions into proper 400 responses (wrap the parse_webhook_request call
and return a 400 with the error message for ValueError/content-type errors and
other parse errors, rather than proceeding with an empty body).
- Around line 111-133: The _validate_max_body_size function currently treats a
missing Content-Length as 0 which allows chunked or omitted-length uploads to
bypass limits; update _validate_max_body_size to reject requests with missing or
undefined request.content_length when security_cfg defines max_body_size (raise
a clear error like "Missing Content-Length for bounded request") or
alternatively implement/enforce a streaming read limit when the body is actually
consumed (add a capped stream reader used by your webhook/handler code so reads
abort if bytes exceed limit). Make the change inside _validate_max_body_size
(and ensure any body-parsing paths use the same capped reader) so the request
cannot bypass the max_body_size by omitting Content-Length or using chunked
transfer.
- Around line 613-614: The canvas update is using the wrong
identifier—UserCanvasService.update_by_id is called with cvs.user_id, which can
persist the DSL to the wrong record; change the call to update using the agent
identifier (e.g., cvs.agent_id) or the service method that updates by agent
(e.g., UserCanvasService.update_by_agent_id) so that the updated DSL (cvs.dsl)
is saved against the agent's canvas record rather than the tenant/user id.
- Around line 61-64: The current loop directly indexes nested DSL fields
(components[k]["obj"]["component_name"], ["params"]["mode"]) which can raise
KeyError/TypeError for malformed agent definitions; update the traversal around
components and cpn_obj to safely access keys (use dict.get and isinstance checks
or try/except) before reading "component_name" and "params", and if required
keys are missing or not the expected types return a controlled bad-request
response (HTTP 400) instead of letting a 500 be raised; specifically adjust the
code that sets cpn_obj, checks cpn_obj["component_name"].lower() and
cpn_obj["params"]["mode"], and set/return webhook_cfg only after validation.
In `@test/testcases/test_web_api/test_agent_app/test_agents_webhook_unit.py`:
- Around line 311-317: The test currently replaces api.apps.login_required with
a no-op (_identity_decorator) which defeats the new auth protection for the
webhook_trace endpoint; replace that stub with an enforcing decorator that
raises or rejects when api_apps_pkg.current_user is missing/unauthenticated
(reference symbols: api_apps_pkg, login_required, current_user, webhook_trace)
and update the test to include an unauthenticated request case that asserts
access is denied (e.g., 401/403) as well as the existing authenticated case to
ensure the decorator is actually applied.
---
Nitpick comments:
In `@api/apps/restful_apis/agent_api.py`:
- Around line 40-42: Add structured, non-sensitive logging to the webhook REST
flow in the webhook(agent_id: str) handler: emit an info log on route entry
(include agent_id and a short route label, using request.path but never the
payload), warn/info when validation rejects input (reference where validations
occur), warn when auth/ownership is denied (reference the auth/ownership check
code paths), and warn/error when trace polling or poll result handling fails
(reference the trace polling logic). Use structured fields for
agent_id/trace_id/status but do NOT log request bodies, tokens, headers, or
secrets; keep logs at appropriate levels (info for entry, warning for
validation/auth failures, error for polling exceptions). Ensure similar logging
is added to the corresponding trace-polling code paths noted near lines 717-720.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ed88ff7a-41f8-45fc-993e-66704a51b688
📒 Files selected for processing (6)
api/apps/restful_apis/agent_api.pyapi/apps/sdk/agents.pytest/testcases/test_web_api/test_agent_app/test_agents_webhook_unit.pyweb/src/pages/agent/hooks/use-build-webhook-url.tsweb/src/pages/agent/webhook-sheet/index.tsxweb/src/utils/api.ts
💤 Files with no reviewable changes (1)
- api/apps/sdk/agents.py
| for k, _ in components.items(): | ||
| cpn_obj = components[k]["obj"] | ||
| if cpn_obj["component_name"].lower() == "begin" and cpn_obj["params"]["mode"] == "Webhook": | ||
| webhook_cfg = cpn_obj["params"] |
There was a problem hiding this comment.
Handle malformed DSL components without raising a 500.
Direct indexing into nested DSL keys can raise KeyError for a partial/corrupt agent definition. Return a controlled bad-request response instead.
🛡️ Proposed safer DSL traversal
- for k, _ in components.items():
- cpn_obj = components[k]["obj"]
- if cpn_obj["component_name"].lower() == "begin" and cpn_obj["params"]["mode"] == "Webhook":
- webhook_cfg = cpn_obj["params"]
+ for component in components.values():
+ cpn_obj = component.get("obj", {}) if isinstance(component, dict) else {}
+ params = cpn_obj.get("params", {})
+ if cpn_obj.get("component_name", "").lower() == "begin" and params.get("mode") == "Webhook":
+ webhook_cfg = params
+ break🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/apps/restful_apis/agent_api.py` around lines 61 - 64, The current loop
directly indexes nested DSL fields (components[k]["obj"]["component_name"],
["params"]["mode"]) which can raise KeyError/TypeError for malformed agent
definitions; update the traversal around components and cpn_obj to safely access
keys (use dict.get and isinstance checks or try/except) before reading
"component_name" and "params", and if required keys are missing or not the
expected types return a controlled bad-request response (HTTP 400) instead of
letting a 500 be raised; specifically adjust the code that sets cpn_obj, checks
cpn_obj["component_name"].lower() and cpn_obj["params"]["mode"], and set/return
webhook_cfg only after validation.
| async def _validate_max_body_size(security_cfg): | ||
| """Check request size does not exceed max_body_size.""" | ||
| max_size = security_cfg.get("max_body_size") | ||
| if not max_size: | ||
| return | ||
|
|
||
| # Convert "10MB" → bytes | ||
| units = {"kb": 1024, "mb": 1024**2} | ||
| size_str = max_size.lower() | ||
|
|
||
| for suffix, factor in units.items(): | ||
| if size_str.endswith(suffix): | ||
| limit = int(size_str.replace(suffix, "")) * factor | ||
| break | ||
| else: | ||
| raise Exception("Invalid max_body_size format") | ||
| MAX_LIMIT = 10 * 1024 * 1024 # 10MB | ||
| if limit > MAX_LIMIT: | ||
| raise Exception("max_body_size exceeds maximum allowed size (10MB)") | ||
|
|
||
| content_length = request.content_length or 0 | ||
| if content_length > limit: | ||
| raise Exception(f"Request body too large: {content_length} > {limit}") |
There was a problem hiding this comment.
Don’t trust Content-Length as the only body-size guard.
If a client omits Content-Length or uses chunked upload, line 131 treats the size as 0, then later reads the body/files. Reject missing length for bounded webhooks or enforce the cap while reading/parsing.
🛡️ Minimal defensive check
- content_length = request.content_length or 0
+ content_length = request.content_length
+ if content_length is None:
+ raise Exception("Missing Content-Length")
if content_length > limit:
raise Exception(f"Request body too large: {content_length} > {limit}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/apps/restful_apis/agent_api.py` around lines 111 - 133, The
_validate_max_body_size function currently treats a missing Content-Length as 0
which allows chunked or omitted-length uploads to bypass limits; update
_validate_max_body_size to reject requests with missing or undefined
request.content_length when security_cfg defines max_body_size (raise a clear
error like "Missing Content-Length for bounded request") or alternatively
implement/enforce a streaming read limit when the body is actually consumed (add
a capped stream reader used by your webhook/handler code so reads abort if bytes
exceed limit). Make the change inside _validate_max_body_size (and ensure any
body-parsing paths use the same capped reader) so the request cannot bypass the
max_body_size by omitting Content-Length or using chunked transfer.
| async def parse_webhook_request(content_type): | ||
| """Parse request based on content-type and return structured data.""" | ||
|
|
||
| # 1. Query | ||
| query_data = {k: v for k, v in request.args.items()} | ||
|
|
||
| # 2. Headers | ||
| header_data = {k: v for k, v in request.headers.items()} | ||
|
|
||
| # 3. Body | ||
| ctype = request.headers.get("Content-Type", "").split(";")[0].strip() | ||
| if ctype and ctype != content_type: | ||
| raise ValueError( | ||
| f"Invalid Content-Type: expect '{content_type}', got '{ctype}'" | ||
| ) | ||
|
|
||
| body_data: dict = {} | ||
|
|
||
| try: | ||
| if ctype == "application/json": | ||
| body_data = await request.get_json() or {} | ||
|
|
||
| elif ctype == "multipart/form-data": | ||
| nonlocal canvas | ||
| form = await request.form | ||
| files = await request.files | ||
|
|
||
| body_data = {} | ||
|
|
||
| for key, value in form.items(): | ||
| body_data[key] = value | ||
|
|
||
| if len(files) > 10: | ||
| raise Exception("Too many uploaded files") | ||
| for key, file in files.items(): | ||
| desc = FileService.upload_info( | ||
| cvs.user_id, # user | ||
| file, # FileStorage | ||
| None # url (None for webhook) | ||
| ) | ||
| file_parsed= await canvas.get_files_async([desc]) | ||
| body_data[key] = file_parsed | ||
|
|
||
| elif ctype == "application/x-www-form-urlencoded": | ||
| form = await request.form | ||
| body_data = dict(form) | ||
|
|
||
| else: | ||
| # text/plain / octet-stream / empty / unknown | ||
| raw = await request.get_data() | ||
| if raw: | ||
| try: | ||
| body_data = json.loads(raw.decode("utf-8")) | ||
| except Exception: | ||
| body_data = {} | ||
| else: | ||
| body_data = {} | ||
|
|
||
| except Exception: | ||
| body_data = {} | ||
|
|
There was a problem hiding this comment.
Preserve parse failures instead of converting them to {}.
The broad except swallows “Too many uploaded files”, upload failures, form read errors, and JSON parse errors, allowing the webhook to continue with an empty body. Also wrap the parse_webhook_request call so content-type mismatches return a 400 instead of escaping.
🛠️ Proposed parse-error handling shape
- try:
- if ctype == "application/json":
- body_data = await request.get_json() or {}
+ if ctype == "application/json":
+ body_data = await request.get_json() or {}
- elif ctype == "multipart/form-data":
- nonlocal canvas
- form = await request.form
- files = await request.files
+ elif ctype == "multipart/form-data":
+ nonlocal canvas
+ form = await request.form
+ files = await request.files
- body_data = {}
+ body_data = {}
- for key, value in form.items():
- body_data[key] = value
+ for key, value in form.items():
+ body_data[key] = value
- if len(files) > 10:
- raise Exception("Too many uploaded files")
- for key, file in files.items():
- desc = FileService.upload_info(
- cvs.user_id, # user
- file, # FileStorage
- None # url (None for webhook)
- )
- file_parsed= await canvas.get_files_async([desc])
- body_data[key] = file_parsed
+ if len(files) > 10:
+ raise Exception("Too many uploaded files")
+ for key, file in files.items():
+ desc = FileService.upload_info(cvs.user_id, file, None)
+ file_parsed = await canvas.get_files_async([desc])
+ body_data[key] = file_parsed
- elif ctype == "application/x-www-form-urlencoded":
- form = await request.form
- body_data = dict(form)
+ elif ctype == "application/x-www-form-urlencoded":
+ form = await request.form
+ body_data = dict(form)
- else:
- # text/plain / octet-stream / empty / unknown
+ else:
+ # text/plain / octet-stream / empty / unknown
+ try:
raw = await request.get_data()
if raw:
try:
body_data = json.loads(raw.decode("utf-8"))
except Exception:
body_data = {}
else:
body_data = {}
-
- except Exception:
- body_data = {}
+ except Exception:
+ body_data = {}- parsed = await parse_webhook_request(webhook_cfg.get("content_types"))
+ try:
+ parsed = await parse_webhook_request(webhook_cfg.get("content_types"))
+ except Exception as e:
+ return get_data_error_result(code=RetCode.BAD_REQUEST, message=str(e)), RetCode.BAD_REQUESTAlso applies to: 523-532
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/apps/restful_apis/agent_api.py` around lines 299 - 359, The
parse_webhook_request async function currently catches all exceptions and
replaces any parse error (including JSON decode errors, "Too many uploaded
files", form/file read failures, or content-type mismatches) with an empty dict,
which hides failures; change it so parsing errors are propagated (or returned as
a structured error) instead of being swallowed: remove or narrow the broad
except Exception block in parse_webhook_request and let real exceptions from
request.get_json(), request.form, request.files, FileService.upload_info,
canvas.get_files_async, and json.loads bubble up (or re-raise them with
context), and ensure callers of parse_webhook_request catch these errors and
convert content-type mismatches or parse exceptions into proper 400 responses
(wrap the parse_webhook_request call and return a 400 with the error message for
ValueError/content-type errors and other parse errors, rather than proceeding
with an empty body).
| if is_test: | ||
| append_webhook_trace( | ||
| agent_id, | ||
| start_ts, | ||
| ans | ||
| ) | ||
| if is_test: | ||
| append_webhook_trace( | ||
| agent_id, | ||
| start_ts, | ||
| { | ||
| "event": "finished", | ||
| "elapsed_time": time.time() - start_ts, | ||
| "success": True, | ||
| } | ||
| ) | ||
| final_content = "".join(contents) | ||
| return { | ||
| "message": final_content, | ||
| "success": True, | ||
| "code": status, | ||
| } | ||
|
|
||
| except Exception as e: | ||
| if is_test: | ||
| append_webhook_trace( | ||
| agent_id, | ||
| start_ts, | ||
| { | ||
| "event": "error", | ||
| "message": str(e), | ||
| "error_type": type(e).__name__, | ||
| } | ||
| ) | ||
| append_webhook_trace( | ||
| agent_id, | ||
| start_ts, | ||
| { | ||
| "event": "finished", | ||
| "elapsed_time": time.time() - start_ts, | ||
| "success": False, | ||
| } | ||
| ) | ||
| return {"code": 400, "message": str(e),"success":False} |
There was a problem hiding this comment.
Guard trace writes in the synchronous execution path.
In this branch, Redis/JSON errors from append_webhook_trace can replace an otherwise valid webhook response with an internal failure, including inside the exception handler. Mirror the background path’s guarded trace logging.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/apps/restful_apis/agent_api.py` around lines 664 - 707, The synchronous
path currently calls append_webhook_trace (with agent_id, start_ts, ans and
later with finished/error events) directly which can raise Redis/JSON errors and
override a valid response; update the synchronous branch in the try and except
blocks to mirror the background path by wrapping each append_webhook_trace call
in a try/except that swallows/logs trace-specific errors so they don't propagate
(i.e., around the append_webhook_trace calls in the successful path and in the
exception handler use a guarded call that catches exceptions and does not
re-raise).
| WEBHOOK_ID_SECRET = "webhook_id_secret" | ||
| sig = hmac.new( | ||
| WEBHOOK_ID_SECRET.encode("utf-8"), | ||
| start_ts.encode("utf-8"), | ||
| hashlib.sha256, | ||
| ).digest() | ||
| return base64.urlsafe_b64encode(sig).decode("utf-8").rstrip("=") |
There was a problem hiding this comment.
Move the webhook-id signing secret out of source code.
A fixed HMAC key is shared by every deployment and cannot be rotated. Load it from configuration/secret storage, and have tests derive the expected ID from that configured value instead of the literal.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/apps/restful_apis/agent_api.py` around lines 727 - 733, The code
currently hardcodes the HMAC key as WEBHOOK_ID_SECRET and derives webhook IDs
with hmac.new(...); replace the literal by reading the secret from
configuration/secret storage (e.g., application config or environment variable)
inside the function that generates the signature (the block using
WEBHOOK_ID_SECRET and hmac.new), fail fast with a clear error if the secret is
missing, and use that configured value in tests to compute the expected ID
rather than asserting against the literal; keep the same
base64.urlsafe_b64encode(...).decode(...).rstrip("=") output behavior so
existing callers are unaffected.
| api_apps_pkg.current_user = SimpleNamespace(id="tenant-1") | ||
|
|
||
| def _identity_decorator(func): | ||
| return func | ||
|
|
||
| api_apps_pkg.login_required = _identity_decorator | ||
| monkeypatch.setitem(sys.modules, "api.apps", api_apps_pkg) |
There was a problem hiding this comment.
Don’t stub login_required as a no-op for the protected logs endpoint.
webhook_trace is now auth-protected, but this identity decorator lets tests pass even if authentication is accidentally removed or bypassed. Use an enforcing stub and add an unauthenticated case.
🧪 Proposed test-stub adjustment
api_apps_pkg.current_user = SimpleNamespace(id="tenant-1")
-def _identity_decorator(func):
- return func
+def _login_required(func):
+ async def _wrapper(*args, **kwargs):
+ if not api_apps_pkg.current_user:
+ raise PermissionError("unauthorized")
+ return await func(*args, **kwargs)
+
+ return _wrapper
-api_apps_pkg.login_required = _identity_decorator
+api_apps_pkg.login_required = _login_requiredBased on learnings, Applies to tests/**/*.py : Add/adjust tests for behavior changes.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| api_apps_pkg.current_user = SimpleNamespace(id="tenant-1") | |
| def _identity_decorator(func): | |
| return func | |
| api_apps_pkg.login_required = _identity_decorator | |
| monkeypatch.setitem(sys.modules, "api.apps", api_apps_pkg) | |
| api_apps_pkg.current_user = SimpleNamespace(id="tenant-1") | |
| def _login_required(func): | |
| async def _wrapper(*args, **kwargs): | |
| if not api_apps_pkg.current_user: | |
| raise PermissionError("unauthorized") | |
| return await func(*args, **kwargs) | |
| return _wrapper | |
| api_apps_pkg.login_required = _login_required | |
| monkeypatch.setitem(sys.modules, "api.apps", api_apps_pkg) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/testcases/test_web_api/test_agent_app/test_agents_webhook_unit.py`
around lines 311 - 317, The test currently replaces api.apps.login_required with
a no-op (_identity_decorator) which defeats the new auth protection for the
webhook_trace endpoint; replace that stub with an enforcing decorator that
raises or rejects when api_apps_pkg.current_user is missing/unauthenticated
(reference symbols: api_apps_pkg, login_required, current_user, webhook_trace)
and update the test to include an unauthenticated request case that asserts
access is denied (e.g., 401/403) as well as the existing authenticated case to
ensure the decorator is actually applied.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #14330 +/- ##
==========================================
- Coverage 97.41% 95.30% -2.11%
==========================================
Files 10 10
Lines 695 703 +8
Branches 111 112 +1
==========================================
- Hits 677 670 -7
- Misses 7 16 +9
- Partials 11 17 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (4)
api/apps/restful_apis/agent_api.py (4)
70-71: Remove duplicate import.
current_user, login_requiredis already imported at line 35.♻️ Proposed cleanup
-from api.apps import current_user, login_required -🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/apps/restful_apis/agent_api.py` around lines 70 - 71, Remove the duplicate import of current_user and login_required from the second import statement so there is only one import of these symbols; locate the redundant line that reads "from api.apps import current_user, login_required" (the duplicate near the top) and delete it, leaving the original import (the one at line ~35) intact to avoid repeated imports.
1062-1062: Fragile test-path detection via hardcoded URL prefix.
/api/v1/...is a deployment-level prefix (blueprinturl_prefix). If the prefix ever changes or the app is mounted under a different base path,is_testsilently becomesFalsefor all/webhook/testcalls, which means test traces stop being persisted without any error.Prefer splitting the two routes into distinct handlers (or using
request.url_rule/ an explicit path suffix check) instead of matching the absolute path string.♻️ Example: rule-based detection
- is_test = request.path.startswith(f"/api/v1/agents/{agent_id}/webhook/test") + rule = getattr(request.url_rule, "rule", "") or "" + is_test = rule.endswith("/webhook/test")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/apps/restful_apis/agent_api.py` at line 1062, The current fragile detection sets is_test using request.path.startswith(...) which depends on a deployment-level prefix; update the logic to detect the webhook-test route reliably by using request.url_rule (or checking the path suffix) or by splitting into separate route handlers so detection doesn't rely on a hardcoded "/api/v1" prefix; specifically, replace the request.path.startswith check for is_test in agent_api.py with a rule-based check using request.url_rule.rule or an endswith comparison against f"/agents/{agent_id}/webhook/test", or move the /webhook/test route into its own handler so you can set is_test deterministically without referencing request.path directly.
1756-1760: Minor robustness gaps inwebhook_trace.
decode_webhook_idcompares encoded IDs with==; since the endpoint is behind@login_requiredthis is a low-risk finding, buthmac.compare_digestis a drop-in, safer replacement.json.loads(raw)at line 1788 is unprotected: a stale/corrupt Redis value returns 500. Wrap intry/exceptand treat as "no logs yet".- Per coding guidelines, add a log line when the key is missing or the payload fails to parse so operators can diagnose.
Also applies to: 1788-1788
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/apps/restful_apis/agent_api.py` around lines 1756 - 1760, Replace the equality check in decode_webhook_id with a constant-time comparison (use hmac.compare_digest(encode_webhook_id(ts), enc_id)) to avoid timing attacks; in webhook_trace where you call json.loads(raw) wrap the load in try/except (JSONDecodeError/ValueError) and treat a parse failure as "no logs yet" instead of raising 500; also add explicit logging when the Redis key is missing or when parsing fails so operators can diagnose (use the module logger, e.g., logger or current_app.logger) and keep existing behavior of returning None/"no logs yet" in those cases.
1281-1301: Use PyJWT's built-inrequireoption for required-claim validation.PyJWT supports required-claim enforcement via
options={"require": [...]}, which handles claim-presence validation and raisesMissingRequiredClaimErrorfor missing claims. The current hand-rolled check unnecessarily restricts reserved claims likeexp(lines 1294–1297), preventing a common and valid use case where you want to enforce expiration checks alongside other claim requirements.♻️ Suggested simplification
- raw_required_claims = jwt_cfg.get("required_claims", []) - if isinstance(raw_required_claims, str): - required_claims = [raw_required_claims] - elif isinstance(raw_required_claims, (list, tuple, set)): - required_claims = list(raw_required_claims) - else: - required_claims = [] - - required_claims = [ - c for c in required_claims - if isinstance(c, str) and c.strip() - ] - - RESERVED_CLAIMS = {"exp", "sub", "aud", "iss", "nbf", "iat"} - for claim in required_claims: - if claim in RESERVED_CLAIMS: - raise Exception(f"Reserved JWT claim cannot be required: {claim}") - - for claim in required_claims: - if claim not in decoded: - raise Exception(f"Missing JWT claim: {claim}") + raw_required_claims = jwt_cfg.get("required_claims", []) or [] + if isinstance(raw_required_claims, str): + raw_required_claims = [raw_required_claims] + required_claims = [c for c in raw_required_claims if isinstance(c, str) and c.strip()] + options["require"] = required_claims(Move the
options["require"]assignment before thejwt.decodecall and remove the manual validation loops.)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/apps/restful_apis/agent_api.py` around lines 1281 - 1301, The custom required-claim validation using required_claims and RESERVED_CLAIMS should be removed and replaced by using PyJWT's built-in require option: extract required_claims from jwt_cfg (as you already do), set options["require"] = required_claims before calling jwt.decode, and then remove the loops referencing RESERVED_CLAIMS and the manual presence check against decoded (the exception-raising lines that check for reserved claims and missing claims). Ensure jwt.decode is invoked so PyJWT can raise MissingRequiredClaimError when a required claim is absent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/apps/restful_apis/agent_api.py`:
- Around line 1162-1175: The whitelist check uses request.remote_addr
(client_ip) which ignores proxy headers, doesn't validate/parses the IP
(ipaddress.ip_address can raise ValueError) and omits logging; update the logic
in agent_api.py to: extract the real client IP by honoring trusted proxy headers
(check X-Forwarded-For first, fall back to X-Real-IP then request.remote_addr),
validate the extracted client_ip with ipaddress.ip_address inside a try/except
to handle invalid or None values (return a clear deny and log the parse error),
perform the CIDR or single-IP comparisons using ipaddress.ip_network/ip_address
as before, and add a structured log entry (including the denied client_ip and
matched whitelist) before raising the Exception when access is denied.
- Around line 1330-1334: The code in parse_webhook_request is comparing the
request Content-Type string (ctype) against a variable named content_type which
may actually be a list from webhook_cfg.get("content_types"), causing every
request to fail; change the logic inside parse_webhook_request (and the other
occurrence around the second check) to normalize the incoming expected types
into an iterable (e.g., if content_type is a str wrap it in [content_type], or
if None use []), then check membership (if content_types and ctype not in
content_types: raise ValueError(...)) and update the error message to show the
allowed list (e.g., "expect one of [...] got '...'" ), referencing
parse_webhook_request, the local variable ctype, and the
webhook_cfg.get("content_types") usage to locate the fixes.
- Around line 1683-1706: The canvas-derived status parsing should be hardened:
when extracting status from ans["data"] in the message loop (the status variable
set via status = int(ans["data"].get("status", status))), catch conversion
errors and validate/clamp the value into the HTTP range (e.g., ensure an int
between 100 and 599, defaulting to a safe code like 500 if invalid); replace
direct use of the raw status in Response(..., status=result["code"], ...) with
the clamped/validated status (use the same clamp routine where result["code"] is
computed). Also add logging on the synchronous failure path (the except/failure
block previously returning the 400 payload) to include agent_id and elapsed time
(and any error details) using the existing logger, and ensure
append_webhook_trace calls (append_webhook_trace, start_ts) are preserved. Use
function/variable names ans, status, append_webhook_trace, final_content,
agent_id, start_ts, and result["code"] to locate and update the spots.
- Around line 1091-1096: The current logic uses webhook_cfg.get("methods", [])
which makes allowed_methods empty and thus permits all HTTP verbs; update the
handling of allowed_methods in the webhook validation (reference symbols:
webhook_cfg, allowed_methods, request_method, get_data_error_result,
RetCode.BAD_REQUEST) to enforce a safe default or fail-fast: either default
allowed_methods to a restrictive set like ["POST"] when methods is
missing/empty, or treat a missing/empty methods value as invalid and return a
BAD_REQUEST error indicating methods must be specified; ensure the subsequent
check (if allowed_methods and request_method not in allowed_methods) uses the
new non-empty allowed_methods so unintended verbs are rejected.
- Around line 1138-1150: The size-string parsing for max_size (variables
size_str, units, limit, MAX_LIMIT) is fragile: change suffix removal to slice
(size_str[:-len(suffix)]) instead of replace, expand units to include b, gb and
accept bare integers, parse the numeric part as float then cast to int to allow
fractional sizes (e.g., int(float(...)*factor)), validate that limit is >0 and
<= MAX_LIMIT (10*1024*1024) and raise a clear ValueError on invalid formats or
out-of-range values, and avoid parsing on every request by moving this logic to
precompute/cache the parsed limit during DSL validation or per-canvas
initialization (store the parsed limit for reuse).
- Around line 1310-1317: The Canvas constructor is being given cvs.user_id
(canvas owner) instead of the tenant_id expected by Canvas, causing billing/KB
attribution errors; in webhook() locate the tenant_id from the request context
(for example from authentication data, request.user.tenant_id, a specific
header, or the webhook configuration available in the function) and pass that
tenant_id as the second argument to Canvas instead of cvs.user_id (keep
canvas_id=agent_id as before); update any variable names accordingly so
Canvas(dsl, tenant_id, agent_id, canvas_id=agent_id) is used and ensure proper
error handling if tenant_id cannot be resolved.
- Around line 1662-1663: The background_run() task is created without a strong
reference and can be garbage-collected; change the creation site where
asyncio.create_task(background_run()) is called so you keep a strong reference
(e.g., create a module-level set like BACKGROUND_TASKS or attach a set to the
app), assign the task to a variable (task =
asyncio.create_task(background_run())), add it to that set, and register a done
callback to remove it when finished (task.add_done_callback(lambda t:
BACKGROUND_TASKS.discard(t))) so the long-running background_run coroutine is
retained for its lifetime.
- Around line 1219-1237: In _validate_token_auth: ensure token config exists by
checking token_cfg.get("token_header") and token_cfg.get("token_value") and
raise a clear Exception like "Token auth not configured" if missing, then fetch
provided header and compare using hmac.compare_digest(provided, token_value)
instead of !=; log an authentication failure (without secrets) and raise on
mismatch. In _validate_basic_auth: use hmac.compare_digest for both
auth.username vs username and auth.password vs password, log failures (no secret
data) and raise on mismatch; ensure you import hmac and use these functions in
_validate_token_auth and _validate_basic_auth to mitigate timing attacks and add
the configuration presence check to prevent silent bypass.
---
Nitpick comments:
In `@api/apps/restful_apis/agent_api.py`:
- Around line 70-71: Remove the duplicate import of current_user and
login_required from the second import statement so there is only one import of
these symbols; locate the redundant line that reads "from api.apps import
current_user, login_required" (the duplicate near the top) and delete it,
leaving the original import (the one at line ~35) intact to avoid repeated
imports.
- Line 1062: The current fragile detection sets is_test using
request.path.startswith(...) which depends on a deployment-level prefix; update
the logic to detect the webhook-test route reliably by using request.url_rule
(or checking the path suffix) or by splitting into separate route handlers so
detection doesn't rely on a hardcoded "/api/v1" prefix; specifically, replace
the request.path.startswith check for is_test in agent_api.py with a rule-based
check using request.url_rule.rule or an endswith comparison against
f"/agents/{agent_id}/webhook/test", or move the /webhook/test route into its own
handler so you can set is_test deterministically without referencing
request.path directly.
- Around line 1756-1760: Replace the equality check in decode_webhook_id with a
constant-time comparison (use hmac.compare_digest(encode_webhook_id(ts),
enc_id)) to avoid timing attacks; in webhook_trace where you call
json.loads(raw) wrap the load in try/except (JSONDecodeError/ValueError) and
treat a parse failure as "no logs yet" instead of raising 500; also add explicit
logging when the Redis key is missing or when parsing fails so operators can
diagnose (use the module logger, e.g., logger or current_app.logger) and keep
existing behavior of returning None/"no logs yet" in those cases.
- Around line 1281-1301: The custom required-claim validation using
required_claims and RESERVED_CLAIMS should be removed and replaced by using
PyJWT's built-in require option: extract required_claims from jwt_cfg (as you
already do), set options["require"] = required_claims before calling jwt.decode,
and then remove the loops referencing RESERVED_CLAIMS and the manual presence
check against decoded (the exception-raising lines that check for reserved
claims and missing claims). Ensure jwt.decode is invoked so PyJWT can raise
MissingRequiredClaimError when a required claim is absent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2a303bf2-e11d-4819-8a1e-3fbeff0570b9
📒 Files selected for processing (3)
api/apps/restful_apis/agent_api.pyapi/apps/sdk/agents.pyweb/src/utils/api.ts
💤 Files with no reviewable changes (1)
- api/apps/sdk/agents.py
🚧 Files skipped from review as they are similar to previous changes (1)
- web/src/utils/api.ts
| allowed_methods = webhook_cfg.get("methods", []) | ||
| request_method = request.method.upper() | ||
| if allowed_methods and request_method not in allowed_methods: | ||
| return get_data_error_result( | ||
| code=RetCode.BAD_REQUEST,message=f"HTTP method '{request_method}' not allowed for this webhook." | ||
| ),RetCode.BAD_REQUEST |
There was a problem hiding this comment.
Empty methods list silently allows every HTTP verb.
webhook_cfg.get("methods", []) defaults to [], and the check at line 1093 is if allowed_methods and request_method not in allowed_methods. When methods is absent or empty, the webhook accepts GET/DELETE/HEAD/etc. via the route declaration at line 1059, which exposes a larger surface than intended. Consider defaulting to a safe set (e.g. ["POST"]) or rejecting when methods is unspecified.
🛡️ Proposed default-deny
- allowed_methods = webhook_cfg.get("methods", [])
+ allowed_methods = webhook_cfg.get("methods") or ["POST"]
request_method = request.method.upper()
- if allowed_methods and request_method not in allowed_methods:
+ allowed_methods = [m.upper() for m in allowed_methods]
+ if request_method not in allowed_methods:
return get_data_error_result(
code=RetCode.BAD_REQUEST,message=f"HTTP method '{request_method}' not allowed for this webhook."
),RetCode.BAD_REQUEST🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/apps/restful_apis/agent_api.py` around lines 1091 - 1096, The current
logic uses webhook_cfg.get("methods", []) which makes allowed_methods empty and
thus permits all HTTP verbs; update the handling of allowed_methods in the
webhook validation (reference symbols: webhook_cfg, allowed_methods,
request_method, get_data_error_result, RetCode.BAD_REQUEST) to enforce a safe
default or fail-fast: either default allowed_methods to a restrictive set like
["POST"] when methods is missing/empty, or treat a missing/empty methods value
as invalid and return a BAD_REQUEST error indicating methods must be specified;
ensure the subsequent check (if allowed_methods and request_method not in
allowed_methods) uses the new non-empty allowed_methods so unintended verbs are
rejected.
| # Convert "10MB" → bytes | ||
| units = {"kb": 1024, "mb": 1024**2} | ||
| size_str = max_size.lower() | ||
|
|
||
| for suffix, factor in units.items(): | ||
| if size_str.endswith(suffix): | ||
| limit = int(size_str.replace(suffix, "")) * factor | ||
| break | ||
| else: | ||
| raise Exception("Invalid max_body_size format") | ||
| MAX_LIMIT = 10 * 1024 * 1024 # 10MB | ||
| if limit > MAX_LIMIT: | ||
| raise Exception("max_body_size exceeds maximum allowed size (10MB)") |
There was a problem hiding this comment.
Fragile max_body_size parsing.
A few correctness problems in the size-string parser:
size_str.replace(suffix, "")replaces all occurrences, so a value like"mb10mb"would parse to10. Usesize_str[:-len(suffix)](suffix-stripping) instead.- Only
kb/mbare supported; nogb/b/bare integer fallback. Any other unit raises a generic "Invalid max_body_size format". int(...)rejects fractional sizes (e.g."1.5mb") and negative values aren't guarded — a negative/zerolimitwould effectively block all requests.- Value is parsed on every request; consider caching per-canvas or precomputing during DSL validation.
🛡️ Proposed fix
- units = {"kb": 1024, "mb": 1024**2}
- size_str = max_size.lower()
-
- for suffix, factor in units.items():
- if size_str.endswith(suffix):
- limit = int(size_str.replace(suffix, "")) * factor
- break
- else:
- raise Exception("Invalid max_body_size format")
+ units = {"kb": 1024, "mb": 1024**2, "gb": 1024**3}
+ size_str = str(max_size).strip().lower()
+ for suffix, factor in units.items():
+ if size_str.endswith(suffix):
+ try:
+ limit = int(float(size_str[: -len(suffix)].strip()) * factor)
+ except ValueError:
+ raise Exception("Invalid max_body_size format")
+ break
+ else:
+ raise Exception("Invalid max_body_size format")
+ if limit <= 0:
+ raise Exception("max_body_size must be positive")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/apps/restful_apis/agent_api.py` around lines 1138 - 1150, The size-string
parsing for max_size (variables size_str, units, limit, MAX_LIMIT) is fragile:
change suffix removal to slice (size_str[:-len(suffix)]) instead of replace,
expand units to include b, gb and accept bare integers, parse the numeric part
as float then cast to int to allow fractional sizes (e.g.,
int(float(...)*factor)), validate that limit is >0 and <= MAX_LIMIT
(10*1024*1024) and raise a clear ValueError on invalid formats or out-of-range
values, and avoid parsing on every request by moving this logic to
precompute/cache the parsed limit during DSL validation or per-canvas
initialization (store the parsed limit for reuse).
| client_ip = request.remote_addr | ||
|
|
||
|
|
||
| for rule in whitelist: | ||
| if "/" in rule: | ||
| # CIDR notation | ||
| if ipaddress.ip_address(client_ip) in ipaddress.ip_network(rule, strict=False): | ||
| return | ||
| else: | ||
| # Single IP | ||
| if client_ip == rule: | ||
| return | ||
|
|
||
| raise Exception(f"IP {client_ip} is not allowed by whitelist") |
There was a problem hiding this comment.
IP whitelist ignores proxy headers and crashes on invalid input.
request.remote_addrwill typically be the upstream proxy/load-balancer IP in a containerized deployment (behind nginx, ELB, etc.), so the DSL-configured allowlist silently locks out every real client while allowing the proxy. Honor a trustedX-Forwarded-For/X-Real-IPwhen the deployment is behind a trusted proxy, or document the requirement.ipaddress.ip_address(client_ip)raisesValueErrorwhenclient_ipisNoneor non-numeric (e.g. IPv6-mapped forms); the resulting exception is converted to a 400 via the outer handler but the message ("IP ... is not allowed by whitelist") becomes misleading. Validate the IP before lookup.- No logging of the denied IP, which makes incident investigation harder. As per coding guidelines, "Add logging for new flows".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/apps/restful_apis/agent_api.py` around lines 1162 - 1175, The whitelist
check uses request.remote_addr (client_ip) which ignores proxy headers, doesn't
validate/parses the IP (ipaddress.ip_address can raise ValueError) and omits
logging; update the logic in agent_api.py to: extract the real client IP by
honoring trusted proxy headers (check X-Forwarded-For first, fall back to
X-Real-IP then request.remote_addr), validate the extracted client_ip with
ipaddress.ip_address inside a try/except to handle invalid or None values
(return a clear deny and log the parse error), perform the CIDR or single-IP
comparisons using ipaddress.ip_network/ip_address as before, and add a
structured log entry (including the denied client_ip and matched whitelist)
before raising the Exception when access is denied.
| key = f"rl:tb:{agent_id}" | ||
| now = time.time() | ||
|
|
||
| try: | ||
| res = REDIS_CONN.lua_token_bucket( | ||
| keys=[key], | ||
| args=[capacity, rate, now, cost], | ||
| client=REDIS_CONN.REDIS, | ||
| ) | ||
|
|
||
| allowed = int(res[0]) | ||
| if allowed != 1: | ||
| raise Exception("Too many requests (rate limit exceeded)") | ||
|
|
||
| except Exception as e: | ||
| raise Exception(f"Rate limit error: {e}") |
There was a problem hiding this comment.
Rate-limit key is per-agent (not per-client), and the error handler masks the 429 signal.
Two concrete problems in this block:
key = f"rl:tb:{agent_id}"— the token bucket is shared by all clients of this webhook. A single noisy caller can starve everyone else; the configuredlimitis effectively a global quota, not a per-caller quota. Typically this should include the client identity (IP / API key / bearer subject).- The
except Exception as e: raise Exception(f"Rate limit error: {e}")block swallows theToo many requests (rate limit exceeded)it raises a few lines above and rewraps it. The caller (line 1309) converts any failure toRetCode.BAD_REQUESTwith the rewrapped message, so:- legitimate rate-limit hits come back as 400 "Rate limit error: Too many requests..." instead of 429;
- Redis/Lua failures are indistinguishable from rate-limit exceeded.
Per coding guidelines ("Add logging for new flows"), also log when a caller is throttled and when the Redis backend errors, separately.
🛡️ Proposed fix
- key = f"rl:tb:{agent_id}"
+ client_ip = request.remote_addr or "unknown"
+ key = f"rl:tb:{agent_id}:{client_ip}"
now = time.time()
try:
res = REDIS_CONN.lua_token_bucket(
keys=[key],
args=[capacity, rate, now, cost],
client=REDIS_CONN.REDIS,
)
-
- allowed = int(res[0])
- if allowed != 1:
- raise Exception("Too many requests (rate limit exceeded)")
-
- except Exception as e:
- raise Exception(f"Rate limit error: {e}")
+ except Exception as e:
+ logging.exception("Rate limit backend error for agent %s", agent_id)
+ raise Exception(f"Rate limit error: {e}")
+
+ if int(res[0]) != 1:
+ logging.warning("Rate limit exceeded for agent=%s client=%s", agent_id, client_ip)
+ raise Exception("Too many requests (rate limit exceeded)")| def _validate_token_auth(security_cfg): | ||
| """Validate header-based token authentication.""" | ||
| token_cfg = security_cfg.get("token",{}) | ||
| header = token_cfg.get("token_header") | ||
| token_value = token_cfg.get("token_value") | ||
|
|
||
| provided = request.headers.get(header) | ||
| if provided != token_value: | ||
| raise Exception("Invalid token authentication") | ||
|
|
||
| def _validate_basic_auth(security_cfg): | ||
| """Validate HTTP Basic Auth credentials.""" | ||
| auth_cfg = security_cfg.get("basic_auth", {}) | ||
| username = auth_cfg.get("username") | ||
| password = auth_cfg.get("password") | ||
|
|
||
| auth = request.authorization | ||
| if not auth or auth.username != username or auth.password != password: | ||
| raise Exception("Invalid Basic Auth credentials") |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Python hmac.compare_digest fixed-time string comparison usage
💡 Result:
hmac.compare_digest is a fixed-time string comparison function in Python's hmac module, designed to prevent timing attacks when verifying HMAC signatures. It compares two strings or bytes-like objects (a and b) and returns True if they are equal, using constant-time comparison to avoid leaking information through execution time differences. Key details: - Usage: import hmac; hmac.compare_digest(computed_digest, received_digest) - Supports str (ASCII only, e.g., from HMAC.hexdigest) or bytes-like objects (e.g., from HMAC.digest). - Added in Python 3.3; in 3.10+, uses OpenSSL's CRYPTO_memcmp for enhanced security. - Always use it instead of == when comparing HMAC outputs to externally supplied digests, as == short-circuits on mismatch, enabling timing attacks where attackers guess byte-by-byte by measuring response times. Example for binary digest: import hmac import hashlib key = b'secret' message = b'hello' computed = hmac.new(key, message, hashlib.sha256).digest received = b'\x...matching bytes...' # from external source if hmac.compare_digest(computed, received): print('Valid') Example for hex digest: computed_hex = hmac.new(key, message, hashlib.sha256).hexdigest received_hex = 'abc123...' # hex string if hmac.compare_digest(computed_hex, received_hex): print('Valid') Note: If lengths differ or errors occur, timing may reveal length/type info (but not content). Official recommendation for all cryptographic comparisons.
Citations:
- 1: https://docs.python.org/3/library/hmac.html
- 2: https://docs.python.org/3/library/hmac
- 3: https://tedboy.github.io/python_stdlib/generated/generated/hmac.compare_digest.html
- 4: https://stackoverflow.com/questions/78339403/how-to-validate-hmac-digest-in-python
- 5: https://stackoverflow.com/questions/55261040/how-could-hmac-comparison-ever-not-be-constant-time-in-python
- 6: https://docs.securesauce.dev/rules/PY005
🏁 Script executed:
# Check if hmac is imported in the file
rg -n "^import hmac|^from hmac" api/apps/restful_apis/agent_api.py
# Get broader context around lines 1219-1237 to see imports and function signatures
head -50 api/apps/restful_apis/agent_api.py | tail -20
# Check for any existing imports section
rg -n "^import|^from" api/apps/restful_apis/agent_api.py | head -20Repository: infiniflow/ragflow
Length of output: 1565
🏁 Script executed:
# Check the actual code at lines 1219-1237 and surrounding context
sed -n '1210,1245p' api/apps/restful_apis/agent_api.py
# Also check if there's any logging in nearby functions
sed -n '1200,1250p' api/apps/restful_apis/agent_api.py | grep -n "logger\|logging"Repository: infiniflow/ragflow
Length of output: 1380
🏁 Script executed:
# Find where these validation functions are called
rg -n "_validate_token_auth|_validate_basic_auth|_validate_jwt_auth" api/apps/restful_apis/agent_api.py
# Check if there's logging for auth failures elsewhere in the file
rg -n "logger.*auth|logging.*auth" api/apps/restful_apis/agent_api.py | head -10
# Check the broader context of how these functions are used
sed -n '1150,1220p' api/apps/restful_apis/agent_api.pyRepository: infiniflow/ragflow
Length of output: 2476
Fix timing-attack vulnerability and silent authentication bypass in token validation.
- Lines 1225 and 1235 use
!=for secret comparison, vulnerable to timing side-channels. Usehmac.compare_digest(...)for the token value, password, and username comparisons to prevent attackers from guessing credentials byte-by-byte through response-time measurement. - Lines 1221–1226: If
token_cfgis empty or missing (e.g., misconfigured DSL), bothheaderandtoken_valueareNone. Sincerequest.headers.get(None)returnsNone, the conditionNone != Noneevaluates toFalse, silently authorizing any request with no credentials. Add a check:if not header or not token_value: raise Exception("Token auth not configured")before attempting header lookup. - Per coding guidelines, add logging for authentication failures (without logging secret material).
🛡️ Proposed fix
def _validate_token_auth(security_cfg):
"""Validate header-based token authentication."""
token_cfg = security_cfg.get("token",{})
header = token_cfg.get("token_header")
token_value = token_cfg.get("token_value")
+ if not header or not token_value:
+ raise Exception("Token auth not configured")
- provided = request.headers.get(header)
- if provided != token_value:
+ provided = request.headers.get(header, "")
+ if not hmac.compare_digest(str(provided), str(token_value)):
+ logger.warning("Invalid token authentication attempt")
raise Exception("Invalid token authentication")
def _validate_basic_auth(security_cfg):
"""Validate HTTP Basic Auth credentials."""
auth_cfg = security_cfg.get("basic_auth", {})
username = auth_cfg.get("username")
password = auth_cfg.get("password")
auth = request.authorization
- if not auth or auth.username != username or auth.password != password:
+ if (
+ not auth
+ or not username
+ or not password
+ or not hmac.compare_digest(str(auth.username), str(username))
+ or not hmac.compare_digest(str(auth.password), str(password))
+ ):
+ logger.warning("Invalid basic auth attempt")
raise Exception("Invalid Basic Auth credentials")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/apps/restful_apis/agent_api.py` around lines 1219 - 1237, In
_validate_token_auth: ensure token config exists by checking
token_cfg.get("token_header") and token_cfg.get("token_value") and raise a clear
Exception like "Token auth not configured" if missing, then fetch provided
header and compare using hmac.compare_digest(provided, token_value) instead of
!=; log an authentication failure (without secrets) and raise on mismatch. In
_validate_basic_auth: use hmac.compare_digest for both auth.username vs username
and auth.password vs password, log failures (no secret data) and raise on
mismatch; ensure you import hmac and use these functions in _validate_token_auth
and _validate_basic_auth to mitigate timing attacks and add the configuration
presence check to prevent silent bypass.
| if not isinstance(cvs.dsl, str): | ||
| dsl = json.dumps(cvs.dsl, ensure_ascii=False) | ||
| try: | ||
| canvas = Canvas(dsl, cvs.user_id, agent_id, canvas_id=agent_id) | ||
| except Exception as e: | ||
| resp=get_data_error_result(code=RetCode.BAD_REQUEST,message=str(e)) | ||
| resp.status_code = RetCode.BAD_REQUEST | ||
| return resp |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Inspect Canvas.__init__ to confirm the expected semantics of each positional arg.
ast-grep --pattern $'class Canvas:
$$$
def __init__($$$):
$$$
'
rg -nP --type=py -C1 '\bCanvas\s*\(' -g '!**/test/**' | head -60Repository: infiniflow/ragflow
Length of output: 2889
🏁 Script executed:
# Get Canvas.__init__ signature
rg -nA15 'class Canvas' agent/canvas.py | head -40Repository: infiniflow/ragflow
Length of output: 707
🏁 Script executed:
# Get full context around lines 1310-1317 in agent_api.py
sed -n '1280,1330p' api/apps/restful_apis/agent_api.py | cat -nRepository: infiniflow/ragflow
Length of output: 2289
🏁 Script executed:
sed -n '1200,1315p' api/apps/restful_apis/agent_api.py | head -120Repository: infiniflow/ragflow
Length of output: 4104
🏁 Script executed:
sed -n '1050,1210p' api/apps/restful_apis/agent_api.py | tail -80Repository: infiniflow/ragflow
Length of output: 2504
🏁 Script executed:
sed -n '900,1050p' api/apps/restful_apis/agent_api.py | grep -n "^@\|^async def\|^def" | tail -10Repository: infiniflow/ragflow
Length of output: 44
🏁 Script executed:
rg -nB50 'canvas = Canvas\(dsl, cvs.user_id' api/apps/restful_apis/agent_api.py | head -80Repository: infiniflow/ragflow
Length of output: 2114
🏁 Script executed:
sed -n '900,1050p' api/apps/restful_apis/agent_api.py | head -100Repository: infiniflow/ragflow
Length of output: 3976
🏁 Script executed:
sed -n '1100,1150p' api/apps/restful_apis/agent_api.py | head -60Repository: infiniflow/ragflow
Length of output: 1694
🏁 Script executed:
rg -nB100 'canvas = Canvas\(dsl, cvs.user_id' api/apps/restful_apis/agent_api.py | grep -E "^[0-9]+-(@|async def|def )" | tail -5Repository: infiniflow/ragflow
Length of output: 44
🏁 Script executed:
# Get the function definition by searching backwards from line 1313
sed -n '1,1313p' api/apps/restful_apis/agent_api.py | tac | grep -m1 "^@\|^async def\|^def " | head -1Repository: infiniflow/ragflow
Length of output: 96
🏁 Script executed:
# Better approach - find the function containing line 1313
awk 'NR<=1313 && /^(@|async def|def )/ {func=$0; line=NR} END {print line": "func}' api/apps/restful_apis/agent_api.pyRepository: infiniflow/ragflow
Length of output: 226
🏁 Script executed:
rg -nA20 'async def webhook\(agent_id: str\)' api/apps/restful_apis/agent_api.py | head -30Repository: infiniflow/ragflow
Length of output: 1075
🏁 Script executed:
sed -n '1061,1320p' api/apps/restful_apis/agent_api.py | grep -E "tenant_id|cvs\." | head -20Repository: infiniflow/ragflow
Length of output: 280
🏁 Script executed:
# Check if cvs (UserCanvas) object has tenant_id or similar field
rg -n "class UserCanvas" api/db/models/ -A20 | head -40Repository: infiniflow/ragflow
Length of output: 117
🏁 Script executed:
# Also check the UserCanvasService to see what fields are available
rg -n "class UserCanvasService" api/db/services/ -A30 | head -40Repository: infiniflow/ragflow
Length of output: 2330
🏁 Script executed:
# Find the UserCanvas model definition
fd -e py "\.py$" api/db/ | xargs rg -l "class UserCanvas" | head -5Repository: infiniflow/ragflow
Length of output: 155
🏁 Script executed:
# Look for model definitions
rg -n "class UserCanvas" api/ -A15 | head -40Repository: infiniflow/ragflow
Length of output: 3164
Pass tenant_id instead of cvs.user_id to Canvas().
The Canvas constructor expects tenant_id as its second positional argument. All other call sites in the codebase pass tenant_id (e.g., lines 187, 446, 466, 683, 981). Here you pass cvs.user_id, which is the canvas owner, not the tenant executing the webhook. This causes model calls, billing, and knowledge-base lookups to be attributed to the wrong tenant.
The webhook() function signature does not provide a tenant_id parameter. You must identify the tenant from the request context (e.g., from authentication, headers, or the webhook configuration) before instantiating Canvas.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/apps/restful_apis/agent_api.py` around lines 1310 - 1317, The Canvas
constructor is being given cvs.user_id (canvas owner) instead of the tenant_id
expected by Canvas, causing billing/KB attribution errors; in webhook() locate
the tenant_id from the request context (for example from authentication data,
request.user.tenant_id, a specific header, or the webhook configuration
available in the function) and pass that tenant_id as the second argument to
Canvas instead of cvs.user_id (keep canvas_id=agent_id as before); update any
variable names accordingly so Canvas(dsl, tenant_id, agent_id,
canvas_id=agent_id) is used and ensure proper error handling if tenant_id cannot
be resolved.
| ctype = request.headers.get("Content-Type", "").split(";")[0].strip() | ||
| if ctype and ctype != content_type: | ||
| raise ValueError( | ||
| f"Invalid Content-Type: expect '{content_type}', got '{ctype}'" | ||
| ) |
There was a problem hiding this comment.
content_types vs content_type: list-vs-string mismatch.
webhook_cfg.get("content_types") (plural — matching what DSL stores) is passed straight into parse_webhook_request(content_type) and compared with ctype != content_type at line 1331. If the DSL holds a list of allowed content types (as the plural field name strongly implies), this comparison always fails and every real request is rejected with Invalid Content-Type: expect '[...]'. Normalize to a list and test membership.
🛡️ Proposed fix
- async def parse_webhook_request(content_type):
+ async def parse_webhook_request(content_types):
"""Parse request based on content-type and return structured data."""
# 1. Query
query_data = {k: v for k, v in request.args.items()}
# 2. Headers
header_data = {k: v for k, v in request.headers.items()}
# 3. Body
ctype = request.headers.get("Content-Type", "").split(";")[0].strip()
- if ctype and ctype != content_type:
- raise ValueError(
- f"Invalid Content-Type: expect '{content_type}', got '{ctype}'"
- )
+ allowed = content_types or []
+ if isinstance(allowed, str):
+ allowed = [allowed]
+ if allowed and ctype and ctype not in allowed:
+ raise ValueError(
+ f"Invalid Content-Type: expect one of {allowed}, got '{ctype}'"
+ )Also applies to: 1544-1544
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/apps/restful_apis/agent_api.py` around lines 1330 - 1334, The code in
parse_webhook_request is comparing the request Content-Type string (ctype)
against a variable named content_type which may actually be a list from
webhook_cfg.get("content_types"), causing every request to fail; change the
logic inside parse_webhook_request (and the other occurrence around the second
check) to normalize the incoming expected types into an iterable (e.g., if
content_type is a str wrap it in [content_type], or if None use []), then check
membership (if content_types and ctype not in content_types: raise
ValueError(...)) and update the error message to show the allowed list (e.g.,
"expect one of [...] got '...'" ), referencing parse_webhook_request, the local
variable ctype, and the webhook_cfg.get("content_types") usage to locate the
fixes.
| asyncio.create_task(background_run()) | ||
| return resp |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
asyncio create_task strong reference weak reference garbage collection
💡 Result:
In Python's asyncio, when you call asyncio.create_task(coro), it returns an asyncio.Task object that schedules the coroutine for execution. The event loop maintains only weak references to tasks (using WeakSet internally before Python 3.14, and per-thread linked lists in 3.14+ that still don't hold strong refs). Without a strong reference elsewhere, the Task can be garbage collected at any time, even mid-execution if not actively running or awaiting. This design prevents memory leaks from "fire-and-forget" tasks that complete and hold results/exceptions indefinitely. Temporary strong refs exist while the task is scheduled (_ready queue) or sleeping (timers), but these are released when the task yields without external refs, allowing GC during cycles. To prevent premature GC: keep strong refs, e.g., in a set with done callbacks for cleanup: background_tasks = set task = asyncio.create_task(some_coro) background_tasks.add(task) task.add_done_callback(background_tasks.discard) Official docs emphasize: "Save a reference to the result of this function, to avoid a task disappearing mid-execution." This behavior is unchanged as of Python 3.14 (today's date 2026-04-24, post-3.14 release).
Citations:
- 1: https://docs.python.org/3/library/asyncio-task.html
- 2: https://docs.python.org/3/library/asyncio-task.html?highlight=create_task
- 3: https://github.com/python/cpython/blob/main/InternalDocs/asyncio.md
- 4: https://stackoverflow.com/questions/75425080/whats-the-benefit-of-asyncio-using-weakrefs-to-keep-track-of-tasks
- 5: https://stackoverflow.com/questions/71938799/python-asyncio-create-task-really-need-to-keep-a-reference/71956673
- 6: Potential Bug:
asyncio.create_tasklacks strong references intelebot.AsyncTeleBot._process_pollingeternnoir/pyTelegramBotAPI#2572 - 7: https://docs.astral.sh/ruff/rules/asyncio-dangling-task/
🏁 Script executed:
# First, let me check the file and the specific lines mentioned
cat -n api/apps/restful_apis/agent_api.py | sed -n '1655,1675p'Repository: infiniflow/ragflow
Length of output: 946
🏁 Script executed:
# Search for background_run function definition
rg -n "def background_run" api/apps/restful_apis/agent_api.pyRepository: infiniflow/ragflow
Length of output: 103
🏁 Script executed:
# Search for any existing task tracking mechanisms (module-level sets)
rg -n "_.*tasks.*set\|_background_tasks\|_webhook.*tasks" api/apps/restful_apis/agent_api.pyRepository: infiniflow/ragflow
Length of output: 44
🏁 Script executed:
# Check for Python version constraints in the project
cat pyproject.toml 2>/dev/null | grep -A 5 "python"Repository: infiniflow/ragflow
Length of output: 2777
🏁 Script executed:
# Check if there's a ruff configuration
cat -n ruff.toml 2>/dev/null || cat -n pyproject.toml 2>/dev/null | grep -A 10 "\[tool.ruff\]"Repository: infiniflow/ragflow
Length of output: 337
🏁 Script executed:
# Get the function context - look at more lines around 1662
cat -n api/apps/restful_apis/agent_api.py | sed -n '1610,1670p'Repository: infiniflow/ragflow
Length of output: 2660
🏁 Script executed:
# Check what function/method contains line 1662
cat -n api/apps/restful_apis/agent_api.py | sed -n '1500,1665p' | grep -E "^\s+[0-9]+\s+(def |async def )" | tail -5Repository: infiniflow/ragflow
Length of output: 309
🏁 Script executed:
# Search for ruff rule asyncio-dangling-task to confirm it matches ASYNC/ASYNC1
web_search
ruff asyncio-dangling-task ASYNC ruleRepository: infiniflow/ragflow
Length of output: 236
🏁 Script executed:
# Find the parent function definition - look higher up
cat -n api/apps/restful_apis/agent_api.py | sed -n '1450,1520p' | grep -E "^[[:space:]]*[0-9]+\s+(def |async def |@)"Repository: infiniflow/ragflow
Length of output: 102
🏁 Script executed:
# Check if there's any existing task management in the entire file
rg -n "add_done_callback|_tasks|task_group\|gather\|shield" api/apps/restful_apis/agent_api.py | head -20Repository: infiniflow/ragflow
Length of output: 44
🌐 Web query:
ruff asyncio-dangling-task ASYNC101 rule documentation
💡 Result:
The query refers to the Ruff linter rule for detecting dangling asyncio tasks, but the rule code is RUF006 (asyncio-dangling-task), not ASYNC101. ASYNC101 was a rule from the deprecated flake8-async linter for blocking calls in async functions (e.g., open, time.sleep), which Ruff has reimplemented as separate rules like ASYNC230, but not directly as ASYNC101. Ruff's asyncio-dangling-task (RUF006), added in v0.0.247, checks for asyncio.create_task and asyncio.ensure_future calls where the returned Task object is not stored in a variable, collection, or otherwise strongly referenced. The event loop holds only a weak reference, so unreferenced tasks can be garbage collected prematurely, leading to inconsistent execution. Example violation: import asyncio asyncio.create_task(some_coro) # Dangling task Fixed: background_tasks = set task = asyncio.create_task(some_coro) background_tasks.add(task) task.add_done_callback(background_tasks.discard) Configuration: Enable with [ruff] select = ["RUF"] in pyproject.toml. It also detects unused assignments to tasks.
Citations:
- 1: https://docs.astral.sh/ruff/rules/asyncio-dangling-task/
- 2: https://beta.ruff.rs/docs/rules/asyncio-dangling-task/
- 3: https://github.com/astral-sh/ruff/blob/main/crates%2Fruff_linter%2Fsrc%2Frules%2Fruff%2Frules%2Fasyncio_dangling_task.rs
- 4: https://docs.astral.sh/ruff/rules/
🏁 Script executed:
# Check the full ruff configuration
cat -n pyproject.toml | sed -n '204,220p'Repository: infiniflow/ragflow
Length of output: 489
🏁 Script executed:
# Look for any RUF selection in the config
rg -n "RUF|ruf" pyproject.tomlRepository: infiniflow/ragflow
Length of output: 99
Retain strong reference to long-running background task.
The background_run() coroutine performs extended operations (async loop over canvas.run(), database updates). Without holding a strong reference to the task, Python's event loop (which uses only weak references) can garbage-collect it mid-execution, silently terminating the canvas run. This violates asyncio best practices documented in the official Python reference.
Implement the recommended pattern:
Fix
+_webhook_background_tasks: set[asyncio.Task] = set()
+
async def background_run():
# ... implementation ...
asyncio.create_task(background_run())
+ task = asyncio.create_task(background_run())
+ _webhook_background_tasks.add(task)
+ task.add_done_callback(_webhook_background_tasks.discard)
return resp🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/apps/restful_apis/agent_api.py` around lines 1662 - 1663, The
background_run() task is created without a strong reference and can be
garbage-collected; change the creation site where
asyncio.create_task(background_run()) is called so you keep a strong reference
(e.g., create a module-level set like BACKGROUND_TASKS or attach a set to the
app), assign the task to a variable (task =
asyncio.create_task(background_run())), add it to that set, and register a done
callback to remove it when finished (task.add_done_callback(lambda t:
BACKGROUND_TASKS.discard(t))) so the long-running background_run coroutine is
retained for its lifetime.
| if ans["event"] == "message_end": | ||
| status = int(ans["data"].get("status", status)) | ||
| if is_test: | ||
| append_webhook_trace( | ||
| agent_id, | ||
| start_ts, | ||
| ans | ||
| ) | ||
| if is_test: | ||
| append_webhook_trace( | ||
| agent_id, | ||
| start_ts, | ||
| { | ||
| "event": "finished", | ||
| "elapsed_time": time.time() - start_ts, | ||
| "success": True, | ||
| } | ||
| ) | ||
| final_content = "".join(contents) | ||
| return { | ||
| "message": final_content, | ||
| "success": True, | ||
| "code": status, | ||
| } |
There was a problem hiding this comment.
Guard the HTTP status derived from canvas output.
status = int(ans["data"].get("status", status))(line 1684) raisesValueErrorif a component emits a non-numericstatus, which is caught by the surroundingexcept Exceptionand turned into a 400 "invalid literal for int()" response — unhelpful for operators. Validate/clamp to a sensible HTTP range.Response(..., status=result["code"], ...)(line 1733) uses whatever integer the canvas produced, so amessage_endpayload can setstatus=999or0. Constrain to known HTTP codes.- Per coding guidelines, "Add logging for new flows": also log the synchronous failure path (line 1708-1728) with the agent id and elapsed time.
🛡️ Minimal hardening
- if ans["event"] == "message_end":
- status = int(ans["data"].get("status", status))
+ if ans["event"] == "message_end":
+ try:
+ candidate = int(ans["data"].get("status", status))
+ except (TypeError, ValueError):
+ candidate = status
+ if 200 <= candidate <= 599:
+ status = candidateAlso applies to: 1730-1735
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/apps/restful_apis/agent_api.py` around lines 1683 - 1706, The
canvas-derived status parsing should be hardened: when extracting status from
ans["data"] in the message loop (the status variable set via status =
int(ans["data"].get("status", status))), catch conversion errors and
validate/clamp the value into the HTTP range (e.g., ensure an int between 100
and 599, defaulting to a safe code like 500 if invalid); replace direct use of
the raw status in Response(..., status=result["code"], ...) with the
clamped/validated status (use the same clamp routine where result["code"] is
computed). Also add logging on the synchronous failure path (the except/failure
block previously returning the 400 payload) to include agent_id and elapsed time
(and any error details) using the existing logger, and ensure
append_webhook_trace calls (append_webhook_trace, start_ts) are preserved. Use
function/variable names ans, status, append_webhook_trace, final_content,
agent_id, start_ts, and result["code"] to locate and update the spots.
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/apps/restful_apis/agent_api.py`:
- Around line 1816-1826: When decode_webhook_id(webhook_id, webhooks) returns
None, do not return finished=True; instead log the unresolved webhook_id (and
any relevant context like since_ts) and return a "not found" response so the
client can surface an expired/unknown trace. Concretely, in the handler
surrounding decode_webhook_id, add a processLogger (or appropriate logger) entry
recording webhook_id and webhooks, then call get_json_result with an error code
(e.g., RetCode.NOT_FOUND) or include a dedicated flag such as {"webhook_id":
webhook_id, "events": [], "next_since_ts": since_ts, "finished": False,
"not_found": True} so the UI can distinguish expired/unknown from completed
runs. Ensure you reference decode_webhook_id, get_json_result, and
RetCode.NOT_FOUND (or the chosen flag) when making the change.
- Line 1059: The is_test boolean is fragile because it uses the raw path; change
detection to use the matched route rule or endpoint instead: replace the
startswith check that assigns is_test (in the handler where request is
available) with a check against request.url_rule.rule.endswith("/webhook/test")
or request.endpoint.endswith("webhook_test") depending on your route naming, so
the flag is derived from the matched route (request.url_rule or
request.endpoint) rather than request.path.
- Around line 1803-1805: The code currently converts Redis string keys to floats
then back to strings (start_ts = min(candidates); real_id = str(start_ts)) which
can mismatch the original Redis key; instead preserve and use the original
string key from webhooks.keys(): when building candidates, keep tuples of
(float_value, original_key) or a mapping float->original_key, pick the tuple
with the minimum float_value and set real_id to the original_key (then call
encode_webhook_id(real_id)); update references around start_ts, real_id, and
encode_webhook_id to use the preserved original string key rather than
str(float).
- Around line 1631-1632: The code sets cvs.dsl then calls
UserCanvasService.update_by_id with the owner's id (cvs.user_id), which updates
the wrong record; change the call to use the canvas/agent primary key (e.g.,
cvs.id) instead: after cvs.dsl = json.loads(str(canvas)) call
UserCanvasService.update_by_id(cvs.id, cvs.to_dict()) so the updated DSL is
persisted against the correct canvas/agent record (ensure the primary key used
matches the model's canvas id field).
- Around line 1739-1742: Replace the strict owner-only check that compares
cvs.user_id to current_user.id with the standard access gate used elsewhere:
call UserCanvasService.accessible(agent_id, tenant_id) using cvs.agent_id as the
agent_id and the current user's tenant identifier (e.g. current_user.tenant_id)
as the tenant_id; if that call returns False, return
get_data_error_result(message="Canvas not found.") so teammates and
joined-tenant users are authorized the same way as other handlers like
list_agent_sessions/get_agent/get_agent_logs.
- Around line 1487-1540: Guard against None before calling .startswith on schema
type variables: in the conversion logic where expected_type is used (the block
starting with expected_type.startswith("array")), change the check to first
ensure expected_type is truthy (e.g., if expected_type and
expected_type.startswith("array")) so it won’t AttributeError when type is
missing; similarly, in validate_type (function validate_type) replace
t.startswith("array") with a guarded check (e.g., if t and
t.startswith("array")) and keep the rest of the array-handling logic unchanged.
Ensure both places reference the same pattern used in default_for_type to avoid
NPEs while preserving current behavior for unknown types.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c310d2ac-dd44-4040-8fa9-223fa682ca82
📒 Files selected for processing (1)
api/apps/restful_apis/agent_api.py
| @manager.route("/agents/<agent_id>/webhook", methods=["POST", "GET", "PUT", "PATCH", "DELETE", "HEAD"]) # noqa: F821 | ||
| @manager.route("/agents/<agent_id>/webhook/test",methods=["POST", "GET", "PUT", "PATCH", "DELETE", "HEAD"],) # noqa: F821 | ||
| async def webhook(agent_id: str): | ||
| is_test = request.path.startswith(f"/api/v1/agents/{agent_id}/webhook/test") |
There was a problem hiding this comment.
Fragile is_test detection via hardcoded URL prefix.
request.path.startswith(f"/api/v1/agents/{agent_id}/webhook/test") couples the handler to a specific URL prefix. If the app is mounted under a different prefix (reverse-proxy rewrite, test client, blueprint prefix change), is_test silently becomes False and trace logging is disabled without any error. Prefer deriving this from the matched endpoint/rule rather than the raw path.
🛡️ Suggested fix
- is_test = request.path.startswith(f"/api/v1/agents/{agent_id}/webhook/test")
+ # Endpoint matching is independent of URL prefix / reverse-proxy rewrites.
+ is_test = request.path.rstrip("/").endswith("/webhook/test")Or, if Quart exposes the matched rule, switch to request.url_rule.rule.endswith("/webhook/test") for a more robust match.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/apps/restful_apis/agent_api.py` at line 1059, The is_test boolean is
fragile because it uses the raw path; change detection to use the matched route
rule or endpoint instead: replace the startswith check that assigns is_test (in
the handler where request is available) with a check against
request.url_rule.rule.endswith("/webhook/test") or
request.endpoint.endswith("webhook_test") depending on your route naming, so the
flag is derived from the matched route (request.url_rule or request.endpoint)
rather than request.path.
| if expected_type.startswith("array"): | ||
| try: | ||
| parsed = json.loads(v) | ||
| if isinstance(parsed, list): | ||
| return parsed | ||
| else: | ||
| raise Exception("JSON is not an array") | ||
| except Exception: | ||
| raise Exception(f"Cannot convert '{value}' to array") | ||
|
|
||
| # String (accept original) | ||
| if expected_type == "string": | ||
| return value | ||
|
|
||
| # File | ||
| if expected_type == "file": | ||
| return value | ||
| # Default: do nothing | ||
| return value | ||
|
|
||
|
|
||
| def validate_type(value, t): | ||
| """Validate value type against schema type t.""" | ||
| if t == "file": | ||
| return isinstance(value, list) | ||
|
|
||
| if t == "string": | ||
| return isinstance(value, str) | ||
|
|
||
| if t == "number": | ||
| return isinstance(value, (int, float)) | ||
|
|
||
| if t == "boolean": | ||
| return isinstance(value, bool) | ||
|
|
||
| if t == "object": | ||
| return isinstance(value, dict) | ||
|
|
||
| # array<string> / array<number> / array<object> | ||
| if t.startswith("array"): | ||
| if not isinstance(value, list): | ||
| return False | ||
|
|
||
| if "<" in t and ">" in t: | ||
| inner = t[t.find("<") + 1 : t.find(">")] | ||
|
|
||
| # Check each element type | ||
| for item in value: | ||
| if not validate_type(item, inner): | ||
| return False | ||
|
|
||
| return True | ||
|
|
||
| return True |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm no existing tests depend on validate_type returning True for unknown types.
rg -nP '\bvalidate_type\s*\(' -C3Repository: infiniflow/ragflow
Length of output: 1602
🏁 Script executed:
# Read the function definitions and surrounding context
sed -n '1430,1550p' api/apps/restful_apis/agent_api.py | cat -nRepository: infiniflow/ragflow
Length of output: 4685
🏁 Script executed:
# Check logging imports and usage in the file
head -50 api/apps/restful_apis/agent_api.py | grep -i "import"Repository: infiniflow/ragflow
Length of output: 1068
🏁 Script executed:
# Search for test files related to webhook or validate_type
find . -type f -name "*test*.py" -o -name "*_test.py" | xargs grep -l "validate_type\|webhook" 2>/dev/null | head -20Repository: infiniflow/ragflow
Length of output: 135
🏁 Script executed:
# Check Python version requirement in project
find . -type f \( -name "setup.py" -o -name "pyproject.toml" -o -name "setup.cfg" -o -name ".python-version" \) -exec cat {} \; 2>/dev/null | head -30Repository: infiniflow/ragflow
Length of output: 879
🏁 Script executed:
cat test/testcases/test_web_api/test_agent_app/test_agents_webhook_unit.pyRepository: infiniflow/ragflow
Length of output: 49601
🏁 Script executed:
# Check main project Python requirement
cat pyproject.toml | head -100Repository: infiniflow/ragflow
Length of output: 3648
🏁 Script executed:
# Check current logging usage patterns in agent_api.py
rg "logging\." api/apps/restful_apis/agent_api.py -A2 -B2 | head -50Repository: infiniflow/ragflow
Length of output: 1183
🏁 Script executed:
# Verify the exact test expectation for unknown type handling
sed -n '940,980p' test/testcases/test_web_api/test_agent_app/test_agents_webhook_unit.pyRepository: infiniflow/ragflow
Length of output: 1593
🏁 Script executed:
# Search for any other places where validate_type might be called with None
rg -n "validate_type\(" api/apps/restful_apis/agent_api.py -B3 -A3Repository: infiniflow/ragflow
Length of output: 867
🏁 Script executed:
# Find the exact lines in test that show unknown type is expected to pass
grep -n "mystery" test/testcases/test_web_api/test_agent_app/test_agents_webhook_unit.py -B3 -A3Repository: infiniflow/ragflow
Length of output: 560
Guard expected_type and t against None before calling .startswith().
Line 1487 (expected_type.startswith("array")) and line 1526 (t.startswith("array")) will raise AttributeError if the schema omits the type property or sets it to None. The default_for_type function already guards with if t and … (line 1430), so these sites should apply the same pattern.
Suggested fix for NPE
def auto_cast_value(value, expected_type):
"""Convert string values into schema type when possible."""
# Non-string values already good
if not isinstance(value, str):
return value
v = value.strip()
+
+ if not expected_type:
+ return value
# Boolean
if expected_type == "boolean": def validate_type(value, t):
"""Validate value type against schema type t."""
+ if not t:
+ return True
if t == "file":Regarding unknown schema types: The test suite (test_webhook_schema_extract_cast_defaults_and_validation_errors) expects unknown types like "mystery" to be silently accepted. Changing validate_type to return False for unrecognized types would be a breaking change. If stricter validation is desired, update both the implementation and its tests together.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/apps/restful_apis/agent_api.py` around lines 1487 - 1540, Guard against
None before calling .startswith on schema type variables: in the conversion
logic where expected_type is used (the block starting with
expected_type.startswith("array")), change the check to first ensure
expected_type is truthy (e.g., if expected_type and
expected_type.startswith("array")) so it won’t AttributeError when type is
missing; similarly, in validate_type (function validate_type) replace
t.startswith("array") with a guarded check (e.g., if t and
t.startswith("array")) and keep the rest of the array-handling logic unchanged.
Ensure both places reference the same pattern used in default_for_type to avoid
NPEs while preserving current behavior for unknown types.
| cvs.dsl = json.loads(str(canvas)) | ||
| UserCanvasService.update_by_id(cvs.user_id, cvs.to_dict()) |
There was a problem hiding this comment.
Critical: canvas still persisted against cvs.user_id — past "Addressed" comment did not land.
UserCanvasService.update_by_id(cvs.user_id, cvs.to_dict()) uses the owner's user id as the primary key for update_by_id, not the canvas/agent id. A successful background run either updates the wrong record (if some canvas happens to share that id) or silently no-ops, and the updated DSL is never saved against this agent. The previous review comment is marked as addressed in commits 75a5548→4e203a4, but the current code still has cvs.user_id here — the fix appears to have been reverted or lost during rebase.
🐛 Proposed fix
- cvs.dsl = json.loads(str(canvas))
- UserCanvasService.update_by_id(cvs.user_id, cvs.to_dict())
+ cvs.dsl = json.loads(str(canvas))
+ UserCanvasService.update_by_id(agent_id, cvs.to_dict())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/apps/restful_apis/agent_api.py` around lines 1631 - 1632, The code sets
cvs.dsl then calls UserCanvasService.update_by_id with the owner's id
(cvs.user_id), which updates the wrong record; change the call to use the
canvas/agent primary key (e.g., cvs.id) instead: after cvs.dsl =
json.loads(str(canvas)) call UserCanvasService.update_by_id(cvs.id,
cvs.to_dict()) so the updated DSL is persisted against the correct canvas/agent
record (ensure the primary key used matches the model's canvas id field).
| if not exists or str(cvs.user_id) != str(current_user.id): | ||
| return get_data_error_result( | ||
| message="Canvas not found.", | ||
| ) |
There was a problem hiding this comment.
Access check is stricter than every other endpoint in this module.
All other handlers in this file gate access with UserCanvasService.accessible(agent_id, tenant_id), which covers joined-tenant / team-member access (see list_agent_sessions, get_agent, get_agent_logs, etc.). Here the check is str(cvs.user_id) != str(current_user.id), which only admits the original creator — any teammate who can view traces for every other endpoint gets a misleading "Canvas not found." here. Also, current_user.id is a user id while cvs.user_id is the canvas tenant_id on this model, so even the creator may fail this check under tenant switching.
🛡️ Suggested fix
- exists, cvs = UserCanvasService.get_by_id(agent_id)
- if not exists or str(cvs.user_id) != str(current_user.id):
- return get_data_error_result(
- message="Canvas not found.",
- )
+ exists, cvs = UserCanvasService.get_by_id(agent_id)
+ if not exists:
+ return get_data_error_result(message="Canvas not found.")
+ if not UserCanvasService.accessible(agent_id, current_user.id):
+ return get_json_result(
+ data=False,
+ message="Only owner of canvas authorized for this operation.",
+ code=RetCode.OPERATING_ERROR,
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/apps/restful_apis/agent_api.py` around lines 1739 - 1742, Replace the
strict owner-only check that compares cvs.user_id to current_user.id with the
standard access gate used elsewhere: call UserCanvasService.accessible(agent_id,
tenant_id) using cvs.agent_id as the agent_id and the current user's tenant
identifier (e.g. current_user.tenant_id) as the tenant_id; if that call returns
False, return get_data_error_result(message="Canvas not found.") so teammates
and joined-tenant users are authorized the same way as other handlers like
list_agent_sessions/get_agent/get_agent_logs.
| start_ts = min(candidates) | ||
| real_id = str(start_ts) | ||
| webhook_id = encode_webhook_id(real_id) |
There was a problem hiding this comment.
str(float) round-trip risk when matching back against Redis keys.
Keys are written with str(start_ts) at line 1569, where start_ts = time.time() (a float). Here the candidate is recovered via float(k) (line 1790) → min(...) → str(...) (line 1804). Python's str(float) uses the shortest repr that round-trips, so str(float(k)) == k is usually true, but not guaranteed — trailing-zero representations ("1700000000.0" vs "1700000000"), or values that normalize differently across Python/Redis encodings, will desync, and decode_webhook_id will then fail to find a key that exists. Safer to keep the original string key from webhooks.keys() rather than re-stringifying the parsed float.
🛡️ Suggested fix
- candidates = [
- float(k) for k in webhooks.keys() if float(k) > since_ts
- ]
+ candidate_keys = [k for k in webhooks.keys() if float(k) > since_ts]
- if not candidates:
+ if not candidate_keys:
return get_json_result(
data={
"webhook_id": None,
"events": [],
"next_since_ts": since_ts,
"finished": False,
}
)
- start_ts = min(candidates)
- real_id = str(start_ts)
- webhook_id = encode_webhook_id(real_id)
+ real_id = min(candidate_keys, key=float)
+ start_ts = float(real_id)
+ webhook_id = encode_webhook_id(real_id)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/apps/restful_apis/agent_api.py` around lines 1803 - 1805, The code
currently converts Redis string keys to floats then back to strings (start_ts =
min(candidates); real_id = str(start_ts)) which can mismatch the original Redis
key; instead preserve and use the original string key from webhooks.keys(): when
building candidates, keep tuples of (float_value, original_key) or a mapping
float->original_key, pick the tuple with the minimum float_value and set real_id
to the original_key (then call encode_webhook_id(real_id)); update references
around start_ts, real_id, and encode_webhook_id to use the preserved original
string key rather than str(float).
| real_id = decode_webhook_id(webhook_id, webhooks) | ||
|
|
||
| if not real_id: | ||
| return get_json_result( | ||
| data={ | ||
| "webhook_id": webhook_id, | ||
| "events": [], | ||
| "next_since_ts": since_ts, | ||
| "finished": True, | ||
| } | ||
| ) |
There was a problem hiding this comment.
finished=True for an unknown webhook_id conflates "expired/unknown" with "run completed".
When decode_webhook_id returns None (no matching start_ts in Redis — trace expired, wrong agent, wrong id), the handler currently returns finished: True. A polling client cannot distinguish this from a successful completion and will stop displaying progress with no error surfaced. Consider returning an explicit error code (e.g. RetCode.NOT_FOUND) or a dedicated flag so the UI can show "trace expired" rather than silently marking the run as done.
Also, per coding guidelines, log the unresolved webhook_id to aid post-hoc debugging.
As per coding guidelines, "Add logging for new flows".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/apps/restful_apis/agent_api.py` around lines 1816 - 1826, When
decode_webhook_id(webhook_id, webhooks) returns None, do not return
finished=True; instead log the unresolved webhook_id (and any relevant context
like since_ts) and return a "not found" response so the client can surface an
expired/unknown trace. Concretely, in the handler surrounding decode_webhook_id,
add a processLogger (or appropriate logger) entry recording webhook_id and
webhooks, then call get_json_result with an error code (e.g., RetCode.NOT_FOUND)
or include a dedicated flag such as {"webhook_id": webhook_id, "events": [],
"next_since_ts": since_ts, "finished": False, "not_found": True} so the UI can
distinguish expired/unknown from completed runs. Ensure you reference
decode_webhook_id, get_json_result, and RetCode.NOT_FOUND (or the chosen flag)
when making the change.
### What problem does this PR solve? migrate agent webhook routes to REST APIs ### Type of change - [x] Refactoring
What problem does this PR solve?
migrate agent webhook routes to REST APIs
Type of change