Skip to content

feat: Add SDK and cURL examples for chunk management, chat assistant, and retrieval (#4310)#14208

Open
bhongong wants to merge 3 commits intoinfiniflow:mainfrom
bhongong:feature/chunk-chat-examples
Open

feat: Add SDK and cURL examples for chunk management, chat assistant, and retrieval (#4310)#14208
bhongong wants to merge 3 commits intoinfiniflow:mainfrom
bhongong:feature/chunk-chat-examples

Conversation

@bhongong
Copy link
Copy Markdown

Closes #4310

What problem does this PR solve?

Issue #4310 requests practical examples for the RAGFlow SDK and HTTP API to help developers get started faster. The existing example/sdk/ folder only contains dataset_example.py. This PR fills the remaining gaps by adding examples for three key API areas not yet covered in main or by other open PRs (#13904, #13284):

  • Chunk management — add, list, update, delete, and retrieve chunks within a dataset
  • Chat assistant — create a chat assistant, open a session, send messages (streaming and non-streaming), and clean up
  • Retrieval — perform semantic retrieval across one or multiple datasets

Files added

Python SDK (example/sdk/)

  • chunk_example.py — CRUD + retrieve chunks via ragflow_sdk
  • chat_assistant_example.py — full chat assistant lifecycle with streaming support
  • retrieval_example.py — single-dataset and multi-dataset retrieval

HTTP / cURL (example/http/)

  • chunk_example.sh — cURL equivalents for all chunk operations
  • chat_assistant_example.sh — cURL for chat assistant CRUD and session messaging
  • retrieval_example.sh — cURL for retrieval endpoint

Type of change

  • Documentation Update
  • New Feature (non-breaking change which adds functionality)

Notes

@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 18, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 40bdd8f0-637e-490a-9b4d-87515df41b08

📥 Commits

Reviewing files that changed from the base of the PR and between ee51436 and 5629937.

📒 Files selected for processing (1)
  • example/sdk/chat_assistant_example.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • example/sdk/chat_assistant_example.py

📝 Walkthrough

Walkthrough

Adds six new runnable example scripts: three Bash HTTP/curl examples and three Python SDK examples demonstrating dataset, document, chunk, retrieval, and chat-assistant workflows (including streaming, polling, and cleanup) against the RAGFlow API.

Changes

Cohort / File(s) Summary
HTTP Examples
example/http/chat_assistant_example.sh, example/http/chunk_example.sh, example/http/retrieval_example.sh
New executable Bash examples using curl and jq to exercise HTTP endpoints: create/list/delete resources, upload documents (multipart), manage chunks, perform retrievals, and demonstrate streaming completions.
SDK Examples
example/sdk/chat_assistant_example.py, example/sdk/chunk_example.py, example/sdk/retrieval_example.py
New Python examples using ragflow_sdk.RAGFlow to create datasets/chat assistants, upload and poll document parsing, add/update/delete chunks, perform semantic and keyword retrievals, show streaming and non-streaming chat flows, and clean up resources.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

size:L

Poem

🐇 I hopped through scripts with curl and py,
I made datasets, chunks, and chats reply,
I polled and streamed beneath the moonlight,
Cleaned up my traces and vanished from sight —
A rabbit’s demo, tidy and spry.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: adding SDK and cURL examples for three key API areas (chunk management, chat assistant, retrieval).
Description check ✅ Passed The description follows the template with clear sections: problem statement references Issue #4310, files are explicitly listed, change type is checked, and comprehensive notes address implementation details and context.
Linked Issues check ✅ Passed The PR directly addresses Issue #4310 by providing Python SDK examples (chunk_example.py, chat_assistant_example.py, retrieval_example.py) and HTTP/cURL examples (chunk_example.sh, chat_assistant_example.sh, retrieval_example.sh) covering chunk management, chat assistant, and retrieval as requested.
Out of Scope Changes check ✅ Passed All changes are within scope: six new example files directly addressing Issue #4310 objectives. No unrelated modifications to existing code, APIs, or dependencies detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@dosubot dosubot Bot added 🌈 python Pull requests that update Python code 📖 documentation Improvements or additions to documentation labels Apr 18, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (4)
example/http/chunk_example.sh (1)

24-59: Make the flow self-contained by capturing created chunk_id.

Right now update/delete depend on manual CHUNK_ID edits, which weakens the “typical workflow” example.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@example/http/chunk_example.sh` around lines 24 - 59, The script should
capture the created chunk's ID from the POST response and assign it to CHUNK_ID
so the subsequent UPDATE and DELETE steps are self-contained; modify the "Add a
chunk to a document" POST step to save the response (e.g., into a variable or
temp file), extract the chunk_id (using jq or a safe shell parse) and export or
set CHUNK_ID for later commands, and add a basic check that chunk_id was
extracted before proceeding to the "Update a chunk" (PUT to
/documents/${DOC_ID}/chunks/${CHUNK_ID}) and "Delete chunks" (DELETE payload
using ${CHUNK_ID}) steps.
example/sdk/retrieval_example.py (1)

29-91: Please use logging for this new Python flow.

Replace print-based progress/output with module-level logging.

As per coding guidelines: **/*.py: Add logging for new flows.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@example/sdk/retrieval_example.py` around lines 29 - 91, Replace all print
statements in retrieval_example.py with module-level logging: add an import
logging and configure a logger (e.g., logger = logging.getLogger(__name__) and
basicConfig) at top of the file, then change every print(...) to
logger.info(...) and the exception block to logger.exception(...) (use
logger.error for non-exception error messages if needed); ensure messages about
progress (Creating dataset, Uploading and parsing document, Document parsed and
ready for retrieval, Performing Retrieval, Cleaning up, Retrieval example done)
use logger.info and the final sys.exit calls remain, but use logger.exception
for the caught Exception e to include stack trace. Use the same variable names
(rag, dataset, docs, chunks) so replacements are localized to this flow.
example/sdk/chat_assistant_example.py (1)

28-92: Use logging for this new flow instead of print statements.

Please migrate these operational messages to logging for consistency and troubleshooting.

As per coding guidelines: **/*.py: Add logging for new flows.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@example/sdk/chat_assistant_example.py` around lines 28 - 92, Replace all
print-based operational messages in this flow with logging: import and configure
the standard logging module at the top, create a logger (e.g., logger =
logging.getLogger(__name__)), and change prints around RAGFlow instantiation,
rag.create_dataset, rag.create_chat, assistant.create_session, session.ask (both
non-stream and stream blocks), assistant.list_sessions, cleanup calls
(assistant.delete_sessions, rag.delete_chats, rag.delete_datasets) to
appropriate logger methods (logger.info for normal flow, logger.debug for
incremental/streamed parts if desired). In the exception handler, replace print
with logger.exception or logger.error with the caught Exception e to preserve
stack trace and error details. Ensure messages still include contextual data
like assistant.id, session.id, dataset.id and that streaming output uses
logger.debug or logger.info consistently.
example/sdk/chunk_example.py (1)

29-85: Use structured logging instead of prints for this new flow.

Switching to logging makes these examples consistent with repo guidance and easier to troubleshoot.

As per coding guidelines: **/*.py: Add logging for new flows.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@example/sdk/chunk_example.py` around lines 29 - 85, Replace all print calls
in this flow with structured logging: import logging, configure a module logger
(e.g., logger = logging.getLogger(__name__) and basicConfig/level) and use
logger.info/debug for progress messages (those around RAGFlow(),
create_dataset(), upload_documents(), async_parse_documents(), list_documents(),
add_chunk(), list_chunks(), update(), delete_chunks(), delete_datasets()) and
logger.exception or logger.error in the except block to capture stack traces;
ensure messages include contextual identifiers (dataset.id, doc.id, chunk.id)
where available.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@example/http/chat_assistant_example.sh`:
- Around line 18-21: The script exposes a hardcoded credential (API_KEY) and
other placeholders (HOST_ADDRESS, CHAT_ID, SESSION_ID); update the script to
read these values from environment variables instead of literals by replacing
the API_KEY assignment with a read from an env var (e.g., ${API_KEY:-}) and
likewise source HOST_ADDRESS, CHAT_ID and SESSION_ID from their respective
environment variables or fail/print a helpful message if missing; ensure
references to API_KEY, HOST_ADDRESS, CHAT_ID, and SESSION_ID in the script use
the env variables so no token literals remain in the repository.
- Around line 23-83: The script currently assumes CHAT_ID and SESSION_ID exist
but doesn't extract them from create responses; update the create-chat POST to
capture the returned chat id (e.g., from the JSON key "id" or "chat_id") and
export it as CHAT_ID, then capture the session id from the create-session
response (e.g., JSON "id" or "session_id") and export it as SESSION_ID before
running subsequent completion, list, and delete calls; use a JSON parser like jq
to parse the curl responses and set the environment variables (referencing the
POST to /api/v1/chats and POST to /api/v1/chats/${CHAT_ID}/sessions to locate
where to add the extraction).

In `@example/http/chunk_example.sh`:
- Around line 18-23: Replace the hardcoded API_KEY value in
example/http/chunk_example.sh with an environment-driven token: stop embedding
the credential-like string assigned to the API_KEY variable and instead read it
from an env var (e.g., RAGFLOW_API_KEY) and make the script use that variable
when making requests; ensure the script exits with a clear error if the env var
is missing so callers must set RAGFLOW_API_KEY before running, and update any
references to API_KEY in the script to use the env-driven variable (leave
HOST_ADDRESS, DATASET_ID, DOC_ID, CHUNK_ID as placeholders).

In `@example/http/retrieval_example.sh`:
- Around line 18-20: The script currently hardcodes an API key in the API_KEY
variable (and also defines HOST_ADDRESS and DATASET_ID); replace the hardcoded
secret with an environment-backed placeholder so no real token is embedded.
Update API_KEY to read from an env var (e.g., RAGFLOW_API_KEY) with a clear
placeholder/default value and ensure the example comment instructs users to set
RAGFLOW_API_KEY instead of committing real keys; keep HOST_ADDRESS and
DATASET_ID as illustrative placeholders only.

In `@example/sdk/chat_assistant_example.py`:
- Around line 25-27: Remove the hardcoded API_KEY constant and instead read the
API key from the environment (use RAGFLOW_API_KEY) with a safe
placeholder/fallback; update the code where API_KEY is referenced (the API_KEY
variable and any use of HOST_ADDRESS if needed) to use the environment-sourced
value and ensure the example documents that users should set RAGFLOW_API_KEY
rather than embedding secrets in chat_assistant_example.py.

In `@example/sdk/chunk_example.py`:
- Around line 47-55: The polling loop in chunk_example.py (the while True that
calls dataset.list_documents(id=doc.id) and inspects doc_status.run and
doc_status.progress) can hang forever; add a timeout mechanism (e.g.,
max_wait_seconds and a start_time/timestamp check) and a fail-fast path that
breaks/raises when the timeout elapses or when doc_status.run indicates a
terminal failure state, and ensure the loop returns or raises a clear error
instead of looping indefinitely; update the loop to check elapsed time each
iteration, log or raise a TimeoutError with context including doc.id and last
doc_status.progress when timed out, and keep the existing success check for
doc_status.run == "1" && progress >= 1.0.
- Around line 26-27: Replace the hardcoded credential in chunk_example.py by
reading API_KEY from an environment variable or a clear placeholder;
specifically replace the literal API_KEY value assigned to the API_KEY symbol
(and keep HOST_ADDRESS configurable) with something like os.getenv("API_KEY") or
"YOUR_API_KEY_HERE" and ensure the module imports os if using environment
variables so examples do not commit real credentials.

In `@example/sdk/retrieval_example.py`:
- Around line 44-49: The infinite loop that polls
dataset.list_documents(id=doc.id) using doc_status.run and doc_status.progress
must be bounded with a timeout and failure branch; modify the polling in
retrieval_example.py (the while True loop that inspects doc_status) to record
start time (or use a max_wait_seconds constant), break successfully if
conditions met, and raise or return an explicit error/report if the timeout
elapses (also consider increasing sleep interval or exponential backoff). Ensure
the logic references the existing symbols dataset.list_documents, doc.id,
doc_status.run and doc_status.progress so callers can detect and handle a
timed-out parse instead of waiting forever.
- Around line 26-27: The example hardcodes credentials: replace the API_KEY
constant by reading from the environment (use RAGFLOW_API_KEY) or a clearly
marked placeholder and avoid committing real keys; update the
retrieval_example.py symbol API_KEY to obtain its value via
os.getenv('RAGFLOW_API_KEY') (or fall back to a "REPLACE_ME" placeholder) and
add a short validation step that raises or prints a clear message if the key is
missing so example users know to set RAGFLOW_API_KEY before running.

---

Nitpick comments:
In `@example/http/chunk_example.sh`:
- Around line 24-59: The script should capture the created chunk's ID from the
POST response and assign it to CHUNK_ID so the subsequent UPDATE and DELETE
steps are self-contained; modify the "Add a chunk to a document" POST step to
save the response (e.g., into a variable or temp file), extract the chunk_id
(using jq or a safe shell parse) and export or set CHUNK_ID for later commands,
and add a basic check that chunk_id was extracted before proceeding to the
"Update a chunk" (PUT to /documents/${DOC_ID}/chunks/${CHUNK_ID}) and "Delete
chunks" (DELETE payload using ${CHUNK_ID}) steps.

In `@example/sdk/chat_assistant_example.py`:
- Around line 28-92: Replace all print-based operational messages in this flow
with logging: import and configure the standard logging module at the top,
create a logger (e.g., logger = logging.getLogger(__name__)), and change prints
around RAGFlow instantiation, rag.create_dataset, rag.create_chat,
assistant.create_session, session.ask (both non-stream and stream blocks),
assistant.list_sessions, cleanup calls (assistant.delete_sessions,
rag.delete_chats, rag.delete_datasets) to appropriate logger methods
(logger.info for normal flow, logger.debug for incremental/streamed parts if
desired). In the exception handler, replace print with logger.exception or
logger.error with the caught Exception e to preserve stack trace and error
details. Ensure messages still include contextual data like assistant.id,
session.id, dataset.id and that streaming output uses logger.debug or
logger.info consistently.

In `@example/sdk/chunk_example.py`:
- Around line 29-85: Replace all print calls in this flow with structured
logging: import logging, configure a module logger (e.g., logger =
logging.getLogger(__name__) and basicConfig/level) and use logger.info/debug for
progress messages (those around RAGFlow(), create_dataset(), upload_documents(),
async_parse_documents(), list_documents(), add_chunk(), list_chunks(), update(),
delete_chunks(), delete_datasets()) and logger.exception or logger.error in the
except block to capture stack traces; ensure messages include contextual
identifiers (dataset.id, doc.id, chunk.id) where available.

In `@example/sdk/retrieval_example.py`:
- Around line 29-91: Replace all print statements in retrieval_example.py with
module-level logging: add an import logging and configure a logger (e.g., logger
= logging.getLogger(__name__) and basicConfig) at top of the file, then change
every print(...) to logger.info(...) and the exception block to
logger.exception(...) (use logger.error for non-exception error messages if
needed); ensure messages about progress (Creating dataset, Uploading and parsing
document, Document parsed and ready for retrieval, Performing Retrieval,
Cleaning up, Retrieval example done) use logger.info and the final sys.exit
calls remain, but use logger.exception for the caught Exception e to include
stack trace. Use the same variable names (rag, dataset, docs, chunks) so
replacements are localized to this flow.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: eabb4e71-5212-44c3-a263-895fdf3b3405

📥 Commits

Reviewing files that changed from the base of the PR and between 6712b50 and 70c6718.

📒 Files selected for processing (6)
  • example/http/chat_assistant_example.sh
  • example/http/chunk_example.sh
  • example/http/retrieval_example.sh
  • example/sdk/chat_assistant_example.py
  • example/sdk/chunk_example.py
  • example/sdk/retrieval_example.py

Comment thread example/http/chat_assistant_example.sh Outdated
Comment thread example/http/chat_assistant_example.sh Outdated
Comment thread example/http/chunk_example.sh Outdated
Comment on lines +18 to +23
HOST_ADDRESS="http://localhost:9380"
API_KEY="ragflow-IzZmY1MGVhYTBhMjExZWZiYTdjMDI0Mm"
DATASET_ID="your_dataset_id"
DOC_ID="your_document_id"
CHUNK_ID="your_chunk_id"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Credential-like API key should not be hardcoded.

Please switch to env-driven configuration for the token in this example script.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@example/http/chunk_example.sh` around lines 18 - 23, Replace the hardcoded
API_KEY value in example/http/chunk_example.sh with an environment-driven token:
stop embedding the credential-like string assigned to the API_KEY variable and
instead read it from an env var (e.g., RAGFLOW_API_KEY) and make the script use
that variable when making requests; ensure the script exits with a clear error
if the env var is missing so callers must set RAGFLOW_API_KEY before running,
and update any references to API_KEY in the script to use the env-driven
variable (leave HOST_ADDRESS, DATASET_ID, DOC_ID, CHUNK_ID as placeholders).

Comment thread example/http/retrieval_example.sh Outdated
Comment thread example/sdk/chat_assistant_example.py Outdated
Comment thread example/sdk/chunk_example.py Outdated
Comment thread example/sdk/chunk_example.py Outdated
Comment thread example/sdk/retrieval_example.py Outdated
Comment thread example/sdk/retrieval_example.py Outdated
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Apr 18, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
example/http/chat_assistant_example.sh (1)

20-20: ⚠️ Potential issue | 🟠 Major

Remove token-like fallback from API_KEY.

Line 20 still embeds a credential-like default value. Use env-only input and fail fast when missing.

🔐 Suggested fix
-API_KEY="${RAGFLOW_API_KEY:-ragflow-IzZmY1MGVhYTBhMjExZWZiYTdjMDI0Mm}"
+: "${RAGFLOW_API_KEY:?Set RAGFLOW_API_KEY before running this script}"
+API_KEY="${RAGFLOW_API_KEY}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@example/http/chat_assistant_example.sh` at line 20, The script sets API_KEY
with a hardcoded token-like fallback
(API_KEY="${RAGFLOW_API_KEY:-ragflow-IzZmY1MGVhYTBhMjExZWZiYTdjMDI0Mm}"); remove
the default fallback so API_KEY is sourced only from the environment (e.g.,
API_KEY="${RAGFLOW_API_KEY}" or equivalent), and add a fail-fast check
immediately after (check API_KEY is non-empty, print a clear error like "Missing
RAGFLOW_API_KEY" to stderr and exit non-zero) so the script never embeds or
silently uses a credential-like default.
example/http/retrieval_example.sh (1)

20-20: ⚠️ Potential issue | 🟠 Major

Use env-only API_KEY; remove token-like default.

Line 20 should not include a credential-shaped fallback literal in a committed example.

🔐 Suggested fix
-API_KEY="${RAGFLOW_API_KEY:-ragflow-IzZmY1MGVhYTBhMjExZWZiYTdjMDI0Mm}"
+: "${RAGFLOW_API_KEY:?Set RAGFLOW_API_KEY before running this script}"
+API_KEY="${RAGFLOW_API_KEY}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@example/http/retrieval_example.sh` at line 20, The example sets API_KEY with
a token-like fallback which must be removed; update the API_KEY assignment
(variable name API_KEY) to use only the environment variable (no hard-coded
default) and, if desired, add a runtime check that fails fast when API_KEY is
empty (e.g., emit an error and exit) so no credential-shaped literal remains in
the committed example.
🧹 Nitpick comments (2)
example/http/chat_assistant_example.sh (1)

30-100: Add fail-fast and timeout options to every curl call.

From line 30 onward, all curl requests use silent mode (-s) without error handling or time bounds. Add --show-error, --fail-with-body, and timeout flags (--connect-timeout, --max-time) to prevent indefinite hangs and mask HTTP errors.

♻️ Suggested pattern
+# Harden curl defaults for examples
+CURL_OPTS=(--silent --show-error --fail-with-body --connect-timeout 5 --max-time 30)
@@
-CHAT_RESPONSE=$(curl -s --request POST \
+CHAT_RESPONSE=$(curl "${CURL_OPTS[@]}" --request POST \
@@
-SESSION_RESPONSE=$(curl -s --request POST \
+SESSION_RESPONSE=$(curl "${CURL_OPTS[@]}" --request POST \
@@
-curl -s --request POST \
+curl "${CURL_OPTS[@]}" --request POST \
@@
-curl -N -s --request POST \
+curl -N "${CURL_OPTS[@]}" --request POST \
@@
-curl -s --request GET \
+curl "${CURL_OPTS[@]}" --request GET \
@@
-curl -s --request DELETE \
+curl "${CURL_OPTS[@]}" --request DELETE \
@@
-curl -s --request DELETE \
+curl "${CURL_OPTS[@]}" --request DELETE \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@example/http/chat_assistant_example.sh` around lines 30 - 100, The curl
invocations used for CHAT_RESPONSE, SESSION_RESPONSE, the non-streaming and
streaming completions, session list/delete, and final chat delete lack fail-fast
and timeout flags; update every curl call in this file (locations around
CHAT_RESPONSE, SESSION_RESPONSE, and the subsequent curl commands) to include
--show-error, --fail-with-body and sensible timeouts such as --connect-timeout 5
and --max-time 60 (or appropriate values for streaming endpoints), ensuring all
requests fail loudly and don’t hang.
example/http/retrieval_example.sh (1)

30-72: Add timeout and error-handling flags to curl calls.

Current calls use only -s (silent mode) without fail flags or timeouts. This means HTTP errors (4xx, 5xx) won't cause failure, and requests may hang indefinitely.

♻️ Suggested pattern
+CURL_OPTS=(--silent --show-error --fail-with-body --connect-timeout 5 --max-time 30)
@@
-DATASET_ID=$(curl -s --request POST \
+DATASET_ID=$(curl "${CURL_OPTS[@]}" --request POST \
@@
-curl -s --request POST \
+curl "${CURL_OPTS[@]}" --request POST \
@@
-curl -s --request POST \
+curl "${CURL_OPTS[@]}" --request POST \
@@
-curl -s --request DELETE \
+curl "${CURL_OPTS[@]}" --request DELETE \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@example/http/retrieval_example.sh` around lines 30 - 72, The curl calls
(including the dataset creation that assigns DATASET_ID and the POSTs to
/api/v1/retrieval and DELETE) should include error-handling and timeouts: add
flags like --fail -S --max-time 10 --connect-timeout 5 (optionally --retry 3
--retry-delay 2) to each curl invocation so HTTP errors surface and requests
don’t hang, and after creating DATASET_ID validate it’s non-empty (exit with an
error message if jq returns empty) before proceeding; update the curl
invocations that build DATASET_ID, the retrieval requests, and the cleanup
DELETE to use these flags and add a simple check for DATASET_ID presence.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@example/http/chat_assistant_example.sh`:
- Around line 38-50: The script extracts CHAT_ID and SESSION_ID unsafely and
without validation (see CHAT_RESPONSE -> CHAT_ID and SESSION_RESPONSE ->
SESSION_ID); update the extraction to quote variable expansions and validate
results: use jq with strict/error checking (e.g., jq -e or checking '.data.id'
!= null), assign into quoted variables like CHAT_ID="$(...)" and
SESSION_ID="$(...)", then test for empty values (if [ -z "$CHAT_ID" ] or
similar) and exit with a clear error message before making subsequent calls to
${HOST_ADDRESS}/api/v1/chats/${CHAT_ID}/sessions; also apply the same quoting to
other expansions like ${API_KEY} and ${HOST_ADDRESS} to avoid word-splitting.

In `@example/http/retrieval_example.sh`:
- Around line 30-35: The inline parsing of the dataset creation response into
DATASET_ID (using curl and jq) isn't validated, so failures produce an
empty/invalid ID and downstream steps run silently; after the POST that sets
DATASET_ID in retrieval_example.sh, check the curl/jq result and the HTTP status
(or use curl --fail) and if DATASET_ID is empty/null or the status is not 2xx,
print an error to stderr and exit with non‑zero status, ensuring subsequent
retrieval/cleanup steps do not run with an invalid DATASET_ID.

---

Duplicate comments:
In `@example/http/chat_assistant_example.sh`:
- Line 20: The script sets API_KEY with a hardcoded token-like fallback
(API_KEY="${RAGFLOW_API_KEY:-ragflow-IzZmY1MGVhYTBhMjExZWZiYTdjMDI0Mm}"); remove
the default fallback so API_KEY is sourced only from the environment (e.g.,
API_KEY="${RAGFLOW_API_KEY}" or equivalent), and add a fail-fast check
immediately after (check API_KEY is non-empty, print a clear error like "Missing
RAGFLOW_API_KEY" to stderr and exit non-zero) so the script never embeds or
silently uses a credential-like default.

In `@example/http/retrieval_example.sh`:
- Line 20: The example sets API_KEY with a token-like fallback which must be
removed; update the API_KEY assignment (variable name API_KEY) to use only the
environment variable (no hard-coded default) and, if desired, add a runtime
check that fails fast when API_KEY is empty (e.g., emit an error and exit) so no
credential-shaped literal remains in the committed example.

---

Nitpick comments:
In `@example/http/chat_assistant_example.sh`:
- Around line 30-100: The curl invocations used for CHAT_RESPONSE,
SESSION_RESPONSE, the non-streaming and streaming completions, session
list/delete, and final chat delete lack fail-fast and timeout flags; update
every curl call in this file (locations around CHAT_RESPONSE, SESSION_RESPONSE,
and the subsequent curl commands) to include --show-error, --fail-with-body and
sensible timeouts such as --connect-timeout 5 and --max-time 60 (or appropriate
values for streaming endpoints), ensuring all requests fail loudly and don’t
hang.

In `@example/http/retrieval_example.sh`:
- Around line 30-72: The curl calls (including the dataset creation that assigns
DATASET_ID and the POSTs to /api/v1/retrieval and DELETE) should include
error-handling and timeouts: add flags like --fail -S --max-time 10
--connect-timeout 5 (optionally --retry 3 --retry-delay 2) to each curl
invocation so HTTP errors surface and requests don’t hang, and after creating
DATASET_ID validate it’s non-empty (exit with an error message if jq returns
empty) before proceeding; update the curl invocations that build DATASET_ID, the
retrieval requests, and the cleanup DELETE to use these flags and add a simple
check for DATASET_ID presence.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 186c293c-6383-4d39-aa63-be92942f8c4f

📥 Commits

Reviewing files that changed from the base of the PR and between 70c6718 and ee51436.

📒 Files selected for processing (5)
  • example/http/chat_assistant_example.sh
  • example/http/chunk_example.sh
  • example/http/retrieval_example.sh
  • example/sdk/chunk_example.py
  • example/sdk/retrieval_example.py
✅ Files skipped from review due to trivial changes (1)
  • example/sdk/chunk_example.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • example/http/chunk_example.sh
  • example/sdk/retrieval_example.py

Comment on lines +38 to +50
CHAT_ID=$(echo $CHAT_RESPONSE | jq -r '.data.id')
echo "Chat Assistant ID: ${CHAT_ID}"

# 2. Create a session for the assistant
echo -e "\n-- Create a session"
SESSION_RESPONSE=$(curl -s --request POST \
--url "${HOST_ADDRESS}/api/v1/chats/${CHAT_ID}/sessions" \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${API_KEY}" \
--data '{
"name": "New Session"
}')
SESSION_ID=$(echo $SESSION_RESPONSE | jq -r '.data.id')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify unquoted expansions and missing empty/null guards for parsed IDs.
rg -n 'CHAT_ID=\$\(echo \$CHAT_RESPONSE|SESSION_ID=\$\(echo \$SESSION_RESPONSE' example/http/chat_assistant_example.sh
rg -n '\.data\.id' example/http/chat_assistant_example.sh

Repository: infiniflow/ragflow

Length of output: 285


🏁 Script executed:

cat -n example/http/chat_assistant_example.sh

Repository: infiniflow/ragflow

Length of output: 4162


Quote response variables and add validation to prevent cascading failures with invalid IDs.

Lines 38 and 50 extract IDs via unquoted variable expansion, which violates shell best practices and lacks validation if the API returns an error or invalid response. If .data.id is missing or null, the script will continue with invalid IDs and cause subsequent API calls to fail (e.g., ${CHAT_ID}/sessions becomes /sessions with an empty ID).

🛠️ Suggested fix
-CHAT_ID=$(echo $CHAT_RESPONSE | jq -r '.data.id')
+CHAT_ID="$(jq -r '.data.id // empty' <<<"$CHAT_RESPONSE")"
+if [[ -z "$CHAT_ID" ]]; then
+  echo "Failed to create chat assistant." >&2
+  echo "$CHAT_RESPONSE" | jq .
+  exit 1
+fi
 echo "Chat Assistant ID: ${CHAT_ID}"
 
 # 2. Create a session for the assistant
 echo -e "\n-- Create a session"
 SESSION_RESPONSE=$(curl -s --request POST \
      --url "${HOST_ADDRESS}/api/v1/chats/${CHAT_ID}/sessions" \
      --header 'Content-Type: application/json' \
      --header "Authorization: Bearer ${API_KEY}" \
      --data '{
       "name": "New Session"
       }')
-SESSION_ID=$(echo $SESSION_RESPONSE | jq -r '.data.id')
+SESSION_ID="$(jq -r '.data.id // empty' <<<"$SESSION_RESPONSE")"
+if [[ -z "$SESSION_ID" ]]; then
+  echo "Failed to create session." >&2
+  echo "$SESSION_RESPONSE" | jq .
+  exit 1
+fi
 echo "Session ID: ${SESSION_ID}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
CHAT_ID=$(echo $CHAT_RESPONSE | jq -r '.data.id')
echo "Chat Assistant ID: ${CHAT_ID}"
# 2. Create a session for the assistant
echo -e "\n-- Create a session"
SESSION_RESPONSE=$(curl -s --request POST \
--url "${HOST_ADDRESS}/api/v1/chats/${CHAT_ID}/sessions" \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${API_KEY}" \
--data '{
"name": "New Session"
}')
SESSION_ID=$(echo $SESSION_RESPONSE | jq -r '.data.id')
CHAT_ID="$(jq -r '.data.id // empty' <<<"$CHAT_RESPONSE")"
if [[ -z "$CHAT_ID" ]]; then
echo "Failed to create chat assistant." >&2
echo "$CHAT_RESPONSE" | jq .
exit 1
fi
echo "Chat Assistant ID: ${CHAT_ID}"
# 2. Create a session for the assistant
echo -e "\n-- Create a session"
SESSION_RESPONSE=$(curl -s --request POST \
--url "${HOST_ADDRESS}/api/v1/chats/${CHAT_ID}/sessions" \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${API_KEY}" \
--data '{
"name": "New Session"
}')
SESSION_ID="$(jq -r '.data.id // empty' <<<"$SESSION_RESPONSE")"
if [[ -z "$SESSION_ID" ]]; then
echo "Failed to create session." >&2
echo "$SESSION_RESPONSE" | jq .
exit 1
fi
echo "Session ID: ${SESSION_ID}"
🧰 Tools
🪛 Shellcheck (0.11.0)

[info] 38-38: Double quote to prevent globbing and word splitting.

(SC2086)


[info] 50-50: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@example/http/chat_assistant_example.sh` around lines 38 - 50, The script
extracts CHAT_ID and SESSION_ID unsafely and without validation (see
CHAT_RESPONSE -> CHAT_ID and SESSION_RESPONSE -> SESSION_ID); update the
extraction to quote variable expansions and validate results: use jq with
strict/error checking (e.g., jq -e or checking '.data.id' != null), assign into
quoted variables like CHAT_ID="$(...)" and SESSION_ID="$(...)", then test for
empty values (if [ -z "$CHAT_ID" ] or similar) and exit with a clear error
message before making subsequent calls to
${HOST_ADDRESS}/api/v1/chats/${CHAT_ID}/sessions; also apply the same quoting to
other expansions like ${API_KEY} and ${HOST_ADDRESS} to avoid word-splitting.

Comment on lines +30 to +35
DATASET_ID=$(curl -s --request POST \
--url "${HOST_ADDRESS}/api/v1/datasets" \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${API_KEY}" \
--data '{"name": "retrieval_shell_example"}' | jq -r '.data.id')
echo "Dataset ID: ${DATASET_ID}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify dataset ID extraction is inline and unguarded.
rg -n 'DATASET_ID=\$\(|\.data\.id' example/http/retrieval_example.sh

Repository: infiniflow/ragflow

Length of output: 175


🏁 Script executed:

cat example/http/retrieval_example.sh

Repository: infiniflow/ragflow

Length of output: 2587


Validate DATASET_ID immediately after creation.

The dataset creation response is parsed inline without validation. If the API request fails or returns an error response, jq -r '.data.id' yields empty/null, and the script continues to execute retrieval and cleanup operations with an invalid ID, causing silent failures.

Suggested fix
-DATASET_ID=$(curl -s --request POST \
+DATASET_RESPONSE=$(curl -s --request POST \
      --url "${HOST_ADDRESS}/api/v1/datasets" \
      --header 'Content-Type: application/json' \
      --header "Authorization: Bearer ${API_KEY}" \
-     --data '{"name": "retrieval_shell_example"}' | jq -r '.data.id')
+     --data '{"name": "retrieval_shell_example"}')
+DATASET_ID="$(jq -r '.data.id // empty' <<<"$DATASET_RESPONSE")"
+if [[ -z "$DATASET_ID" ]]; then
+  echo "Failed to create dataset." >&2
+  echo "$DATASET_RESPONSE" | jq .
+  exit 1
+fi
 echo "Dataset ID: ${DATASET_ID}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@example/http/retrieval_example.sh` around lines 30 - 35, The inline parsing
of the dataset creation response into DATASET_ID (using curl and jq) isn't
validated, so failures produce an empty/invalid ID and downstream steps run
silently; after the POST that sets DATASET_ID in retrieval_example.sh, check the
curl/jq result and the HTTP status (or use curl --fail) and if DATASET_ID is
empty/null or the status is not 2xx, print an error to stderr and exit with
non‑zero status, ensuring subsequent retrieval/cleanup steps do not run with an
invalid DATASET_ID.

@JinHai-CN JinHai-CN added the ci Continue Integration label Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Continue Integration 📖 documentation Improvements or additions to documentation 🌈 python Pull requests that update Python code size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request]: Examples on RAGFlow API

2 participants