Add Gemini batch processing support#926
Conversation
Implements batch_submit, batch_poll, batch_status, batch_retrieve, and batch_result_turn methods for ProviderGoogleGemini, enabling batch_chat() and batch_chat_structured() for Google Gemini models. Key implementation details: - JSONL body preparation converts camelCase to snake_case (required by Gemini's batch parser) while preserving user-defined schema property names - Handles multiple Gemini response formats (plain, wrapped, error/status) - Uses existing google_upload_* functions for file upload Includes unit tests for all helper functions and integration tests that skip gracefully when credentials are not available. Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Gate batch support by base_url (generativelanguage.googleapis.com) so Vertex endpoints don't advertise unsupported batch capability - Keep polling when BATCH_STATE_SUCCEEDED but responsesFile not yet available, preventing premature retrieval errors - Add key field parsing to gemini_json_fallback() for better error recovery - Add pre-recorded fixture (state-capitals-gemini.json) for deterministic offline tests - Update batch_chat() docs and NEWS.md to include Gemini support Co-Authored-By: Claude Opus 4.6 <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
hadley
left a comment
There was a problem hiding this comment.
Thanks for working on this! This is a great first step but I have lots of small questions/requests 😄
| if (request_count <= 0L) { | ||
| return(list(list(status_code = code, body = NULL))) | ||
| } | ||
| return(replicate( |
There was a problem hiding this comment.
Should use rep() not replicate() here
There was a problem hiding this comment.
I think you could eliminate the branch above with return(rep(list(status_code = code, body = NULL), min(0, request_count))
But it might be safer to just error if requiestCount is <0?
There was a problem hiding this comment.
Used rep now. But I left the rest as is; min(0, request_count) seems like it would always return 0. We could do max(0, request_count) or error if requestCount is less than 0; let me know what you prefer!
There was a problem hiding this comment.
Done now, using max(0L, request_count)
| gc_pre$responseSchema %||% gc_pre$response_schema | ||
| } | ||
|
|
||
| body <- gemini_to_snake_case(body) |
There was a problem hiding this comment.
Are you sure this is necessary? Google APIs often seem to take both snake and camel case.
There was a problem hiding this comment.
Claude tested with camelCase and the batch JSONL parser silently ignored the fields — it seems to require protobuf-style snake_case names, unlike the REST API which accepts both. The batch API docs also use snake_case in all their JSONL examples. So this seems to be necessary.
There was a problem hiding this comment.
Thanks for checking! Can you please add a brief summary as a comment?
There was a problem hiding this comment.
Added a comment. Wrote another full test script to double-check the camelCase/snake_case problem, and it actually errors with HTTP 400 when it encounters camelCase (not silent), so have changed the other comment earlier in the file as well.
| path_output <- withr::local_tempfile(fileext = ".jsonl") | ||
| gemini_download_file(provider, responses_file, path_output) | ||
|
|
||
| parsed <- read_ndjson(path_output, fallback = gemini_json_fallback) |
There was a problem hiding this comment.
Why the fallback here? Claude might have copied from OpenAI which seems to be flaky. Do you have evidence that gemini is similarly problematic?
There was a problem hiding this comment.
Removed the fallback. I tested with gemini-2.5-flash and gemini-3-flash-preview both in batch_chat_text and batch_chat_structured and the JSONL is always well-formed, so it seems fine to remove. (I did run into an issue in a different place with the batch output from anthropic/claude, but I don't have a reprex; it concerned tool output in the json. So that's perhaps where this pattern might come from).
|
Thank you @hadley I'll work on this and get back to you this week. I've made a couple of replies but I will submit a revised PR shortly. |
- has_batch_support unconditionally TRUE - Fix doc link, add .internal = TRUE to error - Rename is_terminal to is_done, replicate() to rep() - Simplify responsesFile lookup to single path (response$responsesFile) - Remove gemini_json_fallback (JSONL always well-formed) - Remove @nord annotations - Move gemini_upload/download_file to provider-google-upload.R - Fix test fixture to match actual API response structure Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
Have now submitted a revised PR - hope I have done it correctly, I think it addresses all the comments. Thank you @hadley. |
| gc_pre$responseSchema %||% gc_pre$response_schema | ||
| } | ||
|
|
||
| body <- gemini_to_snake_case(body) |
There was a problem hiding this comment.
Thanks for checking! Can you please add a brief summary as a comment?
| if (request_count <= 0L) { | ||
| return(list(list(status_code = code, body = NULL))) | ||
| } | ||
| return(replicate( |
|
|
||
| # Batch file helpers ----------------------------------------------------------- | ||
|
|
||
| gemini_upload_file <- function( |
There was a problem hiding this comment.
I think you could reduce the duplication here by making this google_upload_file(), then google_upload() could call google_upload_file() then create the ContentUploaded object.
There was a problem hiding this comment.
Done now - extracted google_upload_file()
hadley
left a comment
There was a problem hiding this comment.
Getting very close now. Just a few last questions.
- Add comment explaining why snake_case conversion is needed (batch JSONL requires protobuf field names; camelCase causes HTTP 400) - Simplify error branch in batch_retrieve using max(0L, request_count) - Extract google_upload_file() to reduce duplication between google_upload() and gemini_upload_file() Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
Kia ora @hadley - have addressed the final comments. Thank you for reviewing this! Also found another link that was problematic (https://ai.google.dev/gemini-api/docs/batch) and fixed it (to https://ai.google.dev/gemini-api/docs/batch-api). |
Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
Hi @hadley - I've been doing some testing with a large-scale set of prompts using gemini batch and found a bug (after Codex did some extensive testing with my guidance). This concerned the ordering of the results - I was getting the batch results in the wrong order, despite the presence of the correct keys. Here's the report from Codex: Findings
Focused verification passed with test-batch-chat.R and test-provider-google-batch.R. I also submitted two fresh 4-prompt Gemini 2.5 Flash batches during this session, but both were still pending after several minutes, so there isn’t a new completed live batch from this session yet. Is it ok if I submit an amended PR to deal with the ordering issue? |
|
Thanks for the investigation. A PR would definitely be appreciated! |
Two bugs fixed: 1. `gemini_extract_index()` fell back to the line-number default before checking the `key` field, so `batch_retrieve()` returned results in file order instead of the intended chat order. 2. `batch_chat()`, `batch_chat_text()`, and `batch_chat_structured()` fell through into result handling when `wait = FALSE` and the batch wasn't complete, causing "Expected N, got 0" errors. Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
Thanks @hadley I've submitted a revised PR with the two bug fixes. Let me know what you think! |
Summary
chat_google_gemini()via the Gemini Batch APIProviderGoogleGemini:has_batch_support,batch_submit,batch_poll,batch_status,batch_retrieve,batch_result_turnBATCH_STATE_SUCCEEDEDbut output file isn't available yetbatch_chat()documentation to include GeminiCloses #914
Test plan
gemini-2.5-flashandgemini-3-flash-preview, usingbatch_chatandbatch_chat_structured(4 scenarios, all pass)tests/testthat/batch/state-capitals-gemini.jsonreturns correct state capitalsdevtools::test(): 788 pass, 0 faildevtools::check(): 0 errors, 0 warnings🤖 Generated with Claude Code