feat: Add SDK and cURL examples for chunk management, chat assistant, and retrieval (#4310)#14208
feat: Add SDK and cURL examples for chunk management, chat assistant, and retrieval (#4310)#14208bhongong wants to merge 3 commits intoinfiniflow:mainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds six new runnable example scripts: three Bash HTTP/curl examples and three Python SDK examples demonstrating dataset, document, chunk, retrieval, and chat-assistant workflows (including streaming, polling, and cleanup) against the RAGFlow API. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (4)
example/http/chunk_example.sh (1)
24-59: Make the flow self-contained by capturing createdchunk_id.Right now update/delete depend on manual
CHUNK_IDedits, which weakens the “typical workflow” example.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@example/http/chunk_example.sh` around lines 24 - 59, The script should capture the created chunk's ID from the POST response and assign it to CHUNK_ID so the subsequent UPDATE and DELETE steps are self-contained; modify the "Add a chunk to a document" POST step to save the response (e.g., into a variable or temp file), extract the chunk_id (using jq or a safe shell parse) and export or set CHUNK_ID for later commands, and add a basic check that chunk_id was extracted before proceeding to the "Update a chunk" (PUT to /documents/${DOC_ID}/chunks/${CHUNK_ID}) and "Delete chunks" (DELETE payload using ${CHUNK_ID}) steps.example/sdk/retrieval_example.py (1)
29-91: Please use logging for this new Python flow.Replace print-based progress/output with module-level
logging.As per coding guidelines:
**/*.py: Add logging for new flows.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@example/sdk/retrieval_example.py` around lines 29 - 91, Replace all print statements in retrieval_example.py with module-level logging: add an import logging and configure a logger (e.g., logger = logging.getLogger(__name__) and basicConfig) at top of the file, then change every print(...) to logger.info(...) and the exception block to logger.exception(...) (use logger.error for non-exception error messages if needed); ensure messages about progress (Creating dataset, Uploading and parsing document, Document parsed and ready for retrieval, Performing Retrieval, Cleaning up, Retrieval example done) use logger.info and the final sys.exit calls remain, but use logger.exception for the caught Exception e to include stack trace. Use the same variable names (rag, dataset, docs, chunks) so replacements are localized to this flow.example/sdk/chat_assistant_example.py (1)
28-92: Use logging for this new flow instead of print statements.Please migrate these operational messages to
loggingfor consistency and troubleshooting.As per coding guidelines:
**/*.py: Add logging for new flows.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@example/sdk/chat_assistant_example.py` around lines 28 - 92, Replace all print-based operational messages in this flow with logging: import and configure the standard logging module at the top, create a logger (e.g., logger = logging.getLogger(__name__)), and change prints around RAGFlow instantiation, rag.create_dataset, rag.create_chat, assistant.create_session, session.ask (both non-stream and stream blocks), assistant.list_sessions, cleanup calls (assistant.delete_sessions, rag.delete_chats, rag.delete_datasets) to appropriate logger methods (logger.info for normal flow, logger.debug for incremental/streamed parts if desired). In the exception handler, replace print with logger.exception or logger.error with the caught Exception e to preserve stack trace and error details. Ensure messages still include contextual data like assistant.id, session.id, dataset.id and that streaming output uses logger.debug or logger.info consistently.example/sdk/chunk_example.py (1)
29-85: Use structured logging instead of prints for this new flow.Switching to
loggingmakes these examples consistent with repo guidance and easier to troubleshoot.As per coding guidelines:
**/*.py: Add logging for new flows.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@example/sdk/chunk_example.py` around lines 29 - 85, Replace all print calls in this flow with structured logging: import logging, configure a module logger (e.g., logger = logging.getLogger(__name__) and basicConfig/level) and use logger.info/debug for progress messages (those around RAGFlow(), create_dataset(), upload_documents(), async_parse_documents(), list_documents(), add_chunk(), list_chunks(), update(), delete_chunks(), delete_datasets()) and logger.exception or logger.error in the except block to capture stack traces; ensure messages include contextual identifiers (dataset.id, doc.id, chunk.id) where available.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@example/http/chat_assistant_example.sh`:
- Around line 18-21: The script exposes a hardcoded credential (API_KEY) and
other placeholders (HOST_ADDRESS, CHAT_ID, SESSION_ID); update the script to
read these values from environment variables instead of literals by replacing
the API_KEY assignment with a read from an env var (e.g., ${API_KEY:-}) and
likewise source HOST_ADDRESS, CHAT_ID and SESSION_ID from their respective
environment variables or fail/print a helpful message if missing; ensure
references to API_KEY, HOST_ADDRESS, CHAT_ID, and SESSION_ID in the script use
the env variables so no token literals remain in the repository.
- Around line 23-83: The script currently assumes CHAT_ID and SESSION_ID exist
but doesn't extract them from create responses; update the create-chat POST to
capture the returned chat id (e.g., from the JSON key "id" or "chat_id") and
export it as CHAT_ID, then capture the session id from the create-session
response (e.g., JSON "id" or "session_id") and export it as SESSION_ID before
running subsequent completion, list, and delete calls; use a JSON parser like jq
to parse the curl responses and set the environment variables (referencing the
POST to /api/v1/chats and POST to /api/v1/chats/${CHAT_ID}/sessions to locate
where to add the extraction).
In `@example/http/chunk_example.sh`:
- Around line 18-23: Replace the hardcoded API_KEY value in
example/http/chunk_example.sh with an environment-driven token: stop embedding
the credential-like string assigned to the API_KEY variable and instead read it
from an env var (e.g., RAGFLOW_API_KEY) and make the script use that variable
when making requests; ensure the script exits with a clear error if the env var
is missing so callers must set RAGFLOW_API_KEY before running, and update any
references to API_KEY in the script to use the env-driven variable (leave
HOST_ADDRESS, DATASET_ID, DOC_ID, CHUNK_ID as placeholders).
In `@example/http/retrieval_example.sh`:
- Around line 18-20: The script currently hardcodes an API key in the API_KEY
variable (and also defines HOST_ADDRESS and DATASET_ID); replace the hardcoded
secret with an environment-backed placeholder so no real token is embedded.
Update API_KEY to read from an env var (e.g., RAGFLOW_API_KEY) with a clear
placeholder/default value and ensure the example comment instructs users to set
RAGFLOW_API_KEY instead of committing real keys; keep HOST_ADDRESS and
DATASET_ID as illustrative placeholders only.
In `@example/sdk/chat_assistant_example.py`:
- Around line 25-27: Remove the hardcoded API_KEY constant and instead read the
API key from the environment (use RAGFLOW_API_KEY) with a safe
placeholder/fallback; update the code where API_KEY is referenced (the API_KEY
variable and any use of HOST_ADDRESS if needed) to use the environment-sourced
value and ensure the example documents that users should set RAGFLOW_API_KEY
rather than embedding secrets in chat_assistant_example.py.
In `@example/sdk/chunk_example.py`:
- Around line 47-55: The polling loop in chunk_example.py (the while True that
calls dataset.list_documents(id=doc.id) and inspects doc_status.run and
doc_status.progress) can hang forever; add a timeout mechanism (e.g.,
max_wait_seconds and a start_time/timestamp check) and a fail-fast path that
breaks/raises when the timeout elapses or when doc_status.run indicates a
terminal failure state, and ensure the loop returns or raises a clear error
instead of looping indefinitely; update the loop to check elapsed time each
iteration, log or raise a TimeoutError with context including doc.id and last
doc_status.progress when timed out, and keep the existing success check for
doc_status.run == "1" && progress >= 1.0.
- Around line 26-27: Replace the hardcoded credential in chunk_example.py by
reading API_KEY from an environment variable or a clear placeholder;
specifically replace the literal API_KEY value assigned to the API_KEY symbol
(and keep HOST_ADDRESS configurable) with something like os.getenv("API_KEY") or
"YOUR_API_KEY_HERE" and ensure the module imports os if using environment
variables so examples do not commit real credentials.
In `@example/sdk/retrieval_example.py`:
- Around line 44-49: The infinite loop that polls
dataset.list_documents(id=doc.id) using doc_status.run and doc_status.progress
must be bounded with a timeout and failure branch; modify the polling in
retrieval_example.py (the while True loop that inspects doc_status) to record
start time (or use a max_wait_seconds constant), break successfully if
conditions met, and raise or return an explicit error/report if the timeout
elapses (also consider increasing sleep interval or exponential backoff). Ensure
the logic references the existing symbols dataset.list_documents, doc.id,
doc_status.run and doc_status.progress so callers can detect and handle a
timed-out parse instead of waiting forever.
- Around line 26-27: The example hardcodes credentials: replace the API_KEY
constant by reading from the environment (use RAGFLOW_API_KEY) or a clearly
marked placeholder and avoid committing real keys; update the
retrieval_example.py symbol API_KEY to obtain its value via
os.getenv('RAGFLOW_API_KEY') (or fall back to a "REPLACE_ME" placeholder) and
add a short validation step that raises or prints a clear message if the key is
missing so example users know to set RAGFLOW_API_KEY before running.
---
Nitpick comments:
In `@example/http/chunk_example.sh`:
- Around line 24-59: The script should capture the created chunk's ID from the
POST response and assign it to CHUNK_ID so the subsequent UPDATE and DELETE
steps are self-contained; modify the "Add a chunk to a document" POST step to
save the response (e.g., into a variable or temp file), extract the chunk_id
(using jq or a safe shell parse) and export or set CHUNK_ID for later commands,
and add a basic check that chunk_id was extracted before proceeding to the
"Update a chunk" (PUT to /documents/${DOC_ID}/chunks/${CHUNK_ID}) and "Delete
chunks" (DELETE payload using ${CHUNK_ID}) steps.
In `@example/sdk/chat_assistant_example.py`:
- Around line 28-92: Replace all print-based operational messages in this flow
with logging: import and configure the standard logging module at the top,
create a logger (e.g., logger = logging.getLogger(__name__)), and change prints
around RAGFlow instantiation, rag.create_dataset, rag.create_chat,
assistant.create_session, session.ask (both non-stream and stream blocks),
assistant.list_sessions, cleanup calls (assistant.delete_sessions,
rag.delete_chats, rag.delete_datasets) to appropriate logger methods
(logger.info for normal flow, logger.debug for incremental/streamed parts if
desired). In the exception handler, replace print with logger.exception or
logger.error with the caught Exception e to preserve stack trace and error
details. Ensure messages still include contextual data like assistant.id,
session.id, dataset.id and that streaming output uses logger.debug or
logger.info consistently.
In `@example/sdk/chunk_example.py`:
- Around line 29-85: Replace all print calls in this flow with structured
logging: import logging, configure a module logger (e.g., logger =
logging.getLogger(__name__) and basicConfig/level) and use logger.info/debug for
progress messages (those around RAGFlow(), create_dataset(), upload_documents(),
async_parse_documents(), list_documents(), add_chunk(), list_chunks(), update(),
delete_chunks(), delete_datasets()) and logger.exception or logger.error in the
except block to capture stack traces; ensure messages include contextual
identifiers (dataset.id, doc.id, chunk.id) where available.
In `@example/sdk/retrieval_example.py`:
- Around line 29-91: Replace all print statements in retrieval_example.py with
module-level logging: add an import logging and configure a logger (e.g., logger
= logging.getLogger(__name__) and basicConfig) at top of the file, then change
every print(...) to logger.info(...) and the exception block to
logger.exception(...) (use logger.error for non-exception error messages if
needed); ensure messages about progress (Creating dataset, Uploading and parsing
document, Document parsed and ready for retrieval, Performing Retrieval,
Cleaning up, Retrieval example done) use logger.info and the final sys.exit
calls remain, but use logger.exception for the caught Exception e to include
stack trace. Use the same variable names (rag, dataset, docs, chunks) so
replacements are localized to this flow.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: eabb4e71-5212-44c3-a263-895fdf3b3405
📒 Files selected for processing (6)
example/http/chat_assistant_example.shexample/http/chunk_example.shexample/http/retrieval_example.shexample/sdk/chat_assistant_example.pyexample/sdk/chunk_example.pyexample/sdk/retrieval_example.py
| HOST_ADDRESS="http://localhost:9380" | ||
| API_KEY="ragflow-IzZmY1MGVhYTBhMjExZWZiYTdjMDI0Mm" | ||
| DATASET_ID="your_dataset_id" | ||
| DOC_ID="your_document_id" | ||
| CHUNK_ID="your_chunk_id" | ||
|
|
There was a problem hiding this comment.
Credential-like API key should not be hardcoded.
Please switch to env-driven configuration for the token in this example script.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@example/http/chunk_example.sh` around lines 18 - 23, Replace the hardcoded
API_KEY value in example/http/chunk_example.sh with an environment-driven token:
stop embedding the credential-like string assigned to the API_KEY variable and
instead read it from an env var (e.g., RAGFLOW_API_KEY) and make the script use
that variable when making requests; ensure the script exits with a clear error
if the env var is missing so callers must set RAGFLOW_API_KEY before running,
and update any references to API_KEY in the script to use the env-driven
variable (leave HOST_ADDRESS, DATASET_ID, DOC_ID, CHUNK_ID as placeholders).
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
example/http/chat_assistant_example.sh (1)
20-20:⚠️ Potential issue | 🟠 MajorRemove token-like fallback from
API_KEY.Line 20 still embeds a credential-like default value. Use env-only input and fail fast when missing.
🔐 Suggested fix
-API_KEY="${RAGFLOW_API_KEY:-ragflow-IzZmY1MGVhYTBhMjExZWZiYTdjMDI0Mm}" +: "${RAGFLOW_API_KEY:?Set RAGFLOW_API_KEY before running this script}" +API_KEY="${RAGFLOW_API_KEY}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@example/http/chat_assistant_example.sh` at line 20, The script sets API_KEY with a hardcoded token-like fallback (API_KEY="${RAGFLOW_API_KEY:-ragflow-IzZmY1MGVhYTBhMjExZWZiYTdjMDI0Mm}"); remove the default fallback so API_KEY is sourced only from the environment (e.g., API_KEY="${RAGFLOW_API_KEY}" or equivalent), and add a fail-fast check immediately after (check API_KEY is non-empty, print a clear error like "Missing RAGFLOW_API_KEY" to stderr and exit non-zero) so the script never embeds or silently uses a credential-like default.example/http/retrieval_example.sh (1)
20-20:⚠️ Potential issue | 🟠 MajorUse env-only
API_KEY; remove token-like default.Line 20 should not include a credential-shaped fallback literal in a committed example.
🔐 Suggested fix
-API_KEY="${RAGFLOW_API_KEY:-ragflow-IzZmY1MGVhYTBhMjExZWZiYTdjMDI0Mm}" +: "${RAGFLOW_API_KEY:?Set RAGFLOW_API_KEY before running this script}" +API_KEY="${RAGFLOW_API_KEY}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@example/http/retrieval_example.sh` at line 20, The example sets API_KEY with a token-like fallback which must be removed; update the API_KEY assignment (variable name API_KEY) to use only the environment variable (no hard-coded default) and, if desired, add a runtime check that fails fast when API_KEY is empty (e.g., emit an error and exit) so no credential-shaped literal remains in the committed example.
🧹 Nitpick comments (2)
example/http/chat_assistant_example.sh (1)
30-100: Add fail-fast and timeout options to everycurlcall.From line 30 onward, all curl requests use silent mode (
-s) without error handling or time bounds. Add--show-error,--fail-with-body, and timeout flags (--connect-timeout,--max-time) to prevent indefinite hangs and mask HTTP errors.♻️ Suggested pattern
+# Harden curl defaults for examples +CURL_OPTS=(--silent --show-error --fail-with-body --connect-timeout 5 --max-time 30) @@ -CHAT_RESPONSE=$(curl -s --request POST \ +CHAT_RESPONSE=$(curl "${CURL_OPTS[@]}" --request POST \ @@ -SESSION_RESPONSE=$(curl -s --request POST \ +SESSION_RESPONSE=$(curl "${CURL_OPTS[@]}" --request POST \ @@ -curl -s --request POST \ +curl "${CURL_OPTS[@]}" --request POST \ @@ -curl -N -s --request POST \ +curl -N "${CURL_OPTS[@]}" --request POST \ @@ -curl -s --request GET \ +curl "${CURL_OPTS[@]}" --request GET \ @@ -curl -s --request DELETE \ +curl "${CURL_OPTS[@]}" --request DELETE \ @@ -curl -s --request DELETE \ +curl "${CURL_OPTS[@]}" --request DELETE \🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@example/http/chat_assistant_example.sh` around lines 30 - 100, The curl invocations used for CHAT_RESPONSE, SESSION_RESPONSE, the non-streaming and streaming completions, session list/delete, and final chat delete lack fail-fast and timeout flags; update every curl call in this file (locations around CHAT_RESPONSE, SESSION_RESPONSE, and the subsequent curl commands) to include --show-error, --fail-with-body and sensible timeouts such as --connect-timeout 5 and --max-time 60 (or appropriate values for streaming endpoints), ensuring all requests fail loudly and don’t hang.example/http/retrieval_example.sh (1)
30-72: Add timeout and error-handling flags to curl calls.Current calls use only
-s(silent mode) without fail flags or timeouts. This means HTTP errors (4xx, 5xx) won't cause failure, and requests may hang indefinitely.♻️ Suggested pattern
+CURL_OPTS=(--silent --show-error --fail-with-body --connect-timeout 5 --max-time 30) @@ -DATASET_ID=$(curl -s --request POST \ +DATASET_ID=$(curl "${CURL_OPTS[@]}" --request POST \ @@ -curl -s --request POST \ +curl "${CURL_OPTS[@]}" --request POST \ @@ -curl -s --request POST \ +curl "${CURL_OPTS[@]}" --request POST \ @@ -curl -s --request DELETE \ +curl "${CURL_OPTS[@]}" --request DELETE \🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@example/http/retrieval_example.sh` around lines 30 - 72, The curl calls (including the dataset creation that assigns DATASET_ID and the POSTs to /api/v1/retrieval and DELETE) should include error-handling and timeouts: add flags like --fail -S --max-time 10 --connect-timeout 5 (optionally --retry 3 --retry-delay 2) to each curl invocation so HTTP errors surface and requests don’t hang, and after creating DATASET_ID validate it’s non-empty (exit with an error message if jq returns empty) before proceeding; update the curl invocations that build DATASET_ID, the retrieval requests, and the cleanup DELETE to use these flags and add a simple check for DATASET_ID presence.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@example/http/chat_assistant_example.sh`:
- Around line 38-50: The script extracts CHAT_ID and SESSION_ID unsafely and
without validation (see CHAT_RESPONSE -> CHAT_ID and SESSION_RESPONSE ->
SESSION_ID); update the extraction to quote variable expansions and validate
results: use jq with strict/error checking (e.g., jq -e or checking '.data.id'
!= null), assign into quoted variables like CHAT_ID="$(...)" and
SESSION_ID="$(...)", then test for empty values (if [ -z "$CHAT_ID" ] or
similar) and exit with a clear error message before making subsequent calls to
${HOST_ADDRESS}/api/v1/chats/${CHAT_ID}/sessions; also apply the same quoting to
other expansions like ${API_KEY} and ${HOST_ADDRESS} to avoid word-splitting.
In `@example/http/retrieval_example.sh`:
- Around line 30-35: The inline parsing of the dataset creation response into
DATASET_ID (using curl and jq) isn't validated, so failures produce an
empty/invalid ID and downstream steps run silently; after the POST that sets
DATASET_ID in retrieval_example.sh, check the curl/jq result and the HTTP status
(or use curl --fail) and if DATASET_ID is empty/null or the status is not 2xx,
print an error to stderr and exit with non‑zero status, ensuring subsequent
retrieval/cleanup steps do not run with an invalid DATASET_ID.
---
Duplicate comments:
In `@example/http/chat_assistant_example.sh`:
- Line 20: The script sets API_KEY with a hardcoded token-like fallback
(API_KEY="${RAGFLOW_API_KEY:-ragflow-IzZmY1MGVhYTBhMjExZWZiYTdjMDI0Mm}"); remove
the default fallback so API_KEY is sourced only from the environment (e.g.,
API_KEY="${RAGFLOW_API_KEY}" or equivalent), and add a fail-fast check
immediately after (check API_KEY is non-empty, print a clear error like "Missing
RAGFLOW_API_KEY" to stderr and exit non-zero) so the script never embeds or
silently uses a credential-like default.
In `@example/http/retrieval_example.sh`:
- Line 20: The example sets API_KEY with a token-like fallback which must be
removed; update the API_KEY assignment (variable name API_KEY) to use only the
environment variable (no hard-coded default) and, if desired, add a runtime
check that fails fast when API_KEY is empty (e.g., emit an error and exit) so no
credential-shaped literal remains in the committed example.
---
Nitpick comments:
In `@example/http/chat_assistant_example.sh`:
- Around line 30-100: The curl invocations used for CHAT_RESPONSE,
SESSION_RESPONSE, the non-streaming and streaming completions, session
list/delete, and final chat delete lack fail-fast and timeout flags; update
every curl call in this file (locations around CHAT_RESPONSE, SESSION_RESPONSE,
and the subsequent curl commands) to include --show-error, --fail-with-body and
sensible timeouts such as --connect-timeout 5 and --max-time 60 (or appropriate
values for streaming endpoints), ensuring all requests fail loudly and don’t
hang.
In `@example/http/retrieval_example.sh`:
- Around line 30-72: The curl calls (including the dataset creation that assigns
DATASET_ID and the POSTs to /api/v1/retrieval and DELETE) should include
error-handling and timeouts: add flags like --fail -S --max-time 10
--connect-timeout 5 (optionally --retry 3 --retry-delay 2) to each curl
invocation so HTTP errors surface and requests don’t hang, and after creating
DATASET_ID validate it’s non-empty (exit with an error message if jq returns
empty) before proceeding; update the curl invocations that build DATASET_ID, the
retrieval requests, and the cleanup DELETE to use these flags and add a simple
check for DATASET_ID presence.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 186c293c-6383-4d39-aa63-be92942f8c4f
📒 Files selected for processing (5)
example/http/chat_assistant_example.shexample/http/chunk_example.shexample/http/retrieval_example.shexample/sdk/chunk_example.pyexample/sdk/retrieval_example.py
✅ Files skipped from review due to trivial changes (1)
- example/sdk/chunk_example.py
🚧 Files skipped from review as they are similar to previous changes (2)
- example/http/chunk_example.sh
- example/sdk/retrieval_example.py
| CHAT_ID=$(echo $CHAT_RESPONSE | jq -r '.data.id') | ||
| echo "Chat Assistant ID: ${CHAT_ID}" | ||
|
|
||
| # 2. Create a session for the assistant | ||
| echo -e "\n-- Create a session" | ||
| SESSION_RESPONSE=$(curl -s --request POST \ | ||
| --url "${HOST_ADDRESS}/api/v1/chats/${CHAT_ID}/sessions" \ | ||
| --header 'Content-Type: application/json' \ | ||
| --header "Authorization: Bearer ${API_KEY}" \ | ||
| --data '{ | ||
| "name": "New Session" | ||
| }') | ||
| SESSION_ID=$(echo $SESSION_RESPONSE | jq -r '.data.id') |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify unquoted expansions and missing empty/null guards for parsed IDs.
rg -n 'CHAT_ID=\$\(echo \$CHAT_RESPONSE|SESSION_ID=\$\(echo \$SESSION_RESPONSE' example/http/chat_assistant_example.sh
rg -n '\.data\.id' example/http/chat_assistant_example.shRepository: infiniflow/ragflow
Length of output: 285
🏁 Script executed:
cat -n example/http/chat_assistant_example.shRepository: infiniflow/ragflow
Length of output: 4162
Quote response variables and add validation to prevent cascading failures with invalid IDs.
Lines 38 and 50 extract IDs via unquoted variable expansion, which violates shell best practices and lacks validation if the API returns an error or invalid response. If .data.id is missing or null, the script will continue with invalid IDs and cause subsequent API calls to fail (e.g., ${CHAT_ID}/sessions becomes /sessions with an empty ID).
🛠️ Suggested fix
-CHAT_ID=$(echo $CHAT_RESPONSE | jq -r '.data.id')
+CHAT_ID="$(jq -r '.data.id // empty' <<<"$CHAT_RESPONSE")"
+if [[ -z "$CHAT_ID" ]]; then
+ echo "Failed to create chat assistant." >&2
+ echo "$CHAT_RESPONSE" | jq .
+ exit 1
+fi
echo "Chat Assistant ID: ${CHAT_ID}"
# 2. Create a session for the assistant
echo -e "\n-- Create a session"
SESSION_RESPONSE=$(curl -s --request POST \
--url "${HOST_ADDRESS}/api/v1/chats/${CHAT_ID}/sessions" \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${API_KEY}" \
--data '{
"name": "New Session"
}')
-SESSION_ID=$(echo $SESSION_RESPONSE | jq -r '.data.id')
+SESSION_ID="$(jq -r '.data.id // empty' <<<"$SESSION_RESPONSE")"
+if [[ -z "$SESSION_ID" ]]; then
+ echo "Failed to create session." >&2
+ echo "$SESSION_RESPONSE" | jq .
+ exit 1
+fi
echo "Session ID: ${SESSION_ID}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| CHAT_ID=$(echo $CHAT_RESPONSE | jq -r '.data.id') | |
| echo "Chat Assistant ID: ${CHAT_ID}" | |
| # 2. Create a session for the assistant | |
| echo -e "\n-- Create a session" | |
| SESSION_RESPONSE=$(curl -s --request POST \ | |
| --url "${HOST_ADDRESS}/api/v1/chats/${CHAT_ID}/sessions" \ | |
| --header 'Content-Type: application/json' \ | |
| --header "Authorization: Bearer ${API_KEY}" \ | |
| --data '{ | |
| "name": "New Session" | |
| }') | |
| SESSION_ID=$(echo $SESSION_RESPONSE | jq -r '.data.id') | |
| CHAT_ID="$(jq -r '.data.id // empty' <<<"$CHAT_RESPONSE")" | |
| if [[ -z "$CHAT_ID" ]]; then | |
| echo "Failed to create chat assistant." >&2 | |
| echo "$CHAT_RESPONSE" | jq . | |
| exit 1 | |
| fi | |
| echo "Chat Assistant ID: ${CHAT_ID}" | |
| # 2. Create a session for the assistant | |
| echo -e "\n-- Create a session" | |
| SESSION_RESPONSE=$(curl -s --request POST \ | |
| --url "${HOST_ADDRESS}/api/v1/chats/${CHAT_ID}/sessions" \ | |
| --header 'Content-Type: application/json' \ | |
| --header "Authorization: Bearer ${API_KEY}" \ | |
| --data '{ | |
| "name": "New Session" | |
| }') | |
| SESSION_ID="$(jq -r '.data.id // empty' <<<"$SESSION_RESPONSE")" | |
| if [[ -z "$SESSION_ID" ]]; then | |
| echo "Failed to create session." >&2 | |
| echo "$SESSION_RESPONSE" | jq . | |
| exit 1 | |
| fi | |
| echo "Session ID: ${SESSION_ID}" |
🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 38-38: Double quote to prevent globbing and word splitting.
(SC2086)
[info] 50-50: Double quote to prevent globbing and word splitting.
(SC2086)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@example/http/chat_assistant_example.sh` around lines 38 - 50, The script
extracts CHAT_ID and SESSION_ID unsafely and without validation (see
CHAT_RESPONSE -> CHAT_ID and SESSION_RESPONSE -> SESSION_ID); update the
extraction to quote variable expansions and validate results: use jq with
strict/error checking (e.g., jq -e or checking '.data.id' != null), assign into
quoted variables like CHAT_ID="$(...)" and SESSION_ID="$(...)", then test for
empty values (if [ -z "$CHAT_ID" ] or similar) and exit with a clear error
message before making subsequent calls to
${HOST_ADDRESS}/api/v1/chats/${CHAT_ID}/sessions; also apply the same quoting to
other expansions like ${API_KEY} and ${HOST_ADDRESS} to avoid word-splitting.
| DATASET_ID=$(curl -s --request POST \ | ||
| --url "${HOST_ADDRESS}/api/v1/datasets" \ | ||
| --header 'Content-Type: application/json' \ | ||
| --header "Authorization: Bearer ${API_KEY}" \ | ||
| --data '{"name": "retrieval_shell_example"}' | jq -r '.data.id') | ||
| echo "Dataset ID: ${DATASET_ID}" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify dataset ID extraction is inline and unguarded.
rg -n 'DATASET_ID=\$\(|\.data\.id' example/http/retrieval_example.shRepository: infiniflow/ragflow
Length of output: 175
🏁 Script executed:
cat example/http/retrieval_example.shRepository: infiniflow/ragflow
Length of output: 2587
Validate DATASET_ID immediately after creation.
The dataset creation response is parsed inline without validation. If the API request fails or returns an error response, jq -r '.data.id' yields empty/null, and the script continues to execute retrieval and cleanup operations with an invalid ID, causing silent failures.
Suggested fix
-DATASET_ID=$(curl -s --request POST \
+DATASET_RESPONSE=$(curl -s --request POST \
--url "${HOST_ADDRESS}/api/v1/datasets" \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${API_KEY}" \
- --data '{"name": "retrieval_shell_example"}' | jq -r '.data.id')
+ --data '{"name": "retrieval_shell_example"}')
+DATASET_ID="$(jq -r '.data.id // empty' <<<"$DATASET_RESPONSE")"
+if [[ -z "$DATASET_ID" ]]; then
+ echo "Failed to create dataset." >&2
+ echo "$DATASET_RESPONSE" | jq .
+ exit 1
+fi
echo "Dataset ID: ${DATASET_ID}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@example/http/retrieval_example.sh` around lines 30 - 35, The inline parsing
of the dataset creation response into DATASET_ID (using curl and jq) isn't
validated, so failures produce an empty/invalid ID and downstream steps run
silently; after the POST that sets DATASET_ID in retrieval_example.sh, check the
curl/jq result and the HTTP status (or use curl --fail) and if DATASET_ID is
empty/null or the status is not 2xx, print an error to stderr and exit with
non‑zero status, ensuring subsequent retrieval/cleanup steps do not run with an
invalid DATASET_ID.
Closes #4310
What problem does this PR solve?
Issue #4310 requests practical examples for the RAGFlow SDK and HTTP API to help developers get started faster. The existing
example/sdk/folder only containsdataset_example.py. This PR fills the remaining gaps by adding examples for three key API areas not yet covered inmainor by other open PRs (#13904, #13284):Files added
Python SDK (
example/sdk/)chunk_example.py— CRUD + retrieve chunks viaragflow_sdkchat_assistant_example.py— full chat assistant lifecycle with streaming supportretrieval_example.py— single-dataset and multi-dataset retrievalHTTP / cURL (
example/http/)chunk_example.sh— cURL equivalents for all chunk operationschat_assistant_example.sh— cURL for chat assistant CRUD and session messagingretrieval_example.sh— cURL for retrieval endpointType of change
Notes
dataset_example.py(Apache 2.0 header,HOST_ADDRESS/API_KEYconstants,try/exceptwithsys.exit).ragflow_sdk/modules/*.pysource code.