GSoC 2026 Proposal | Multimodal AI & Agent API Eval Framework | Soumyaraj Bag#1601
Conversation
animator
left a comment
There was a problem hiding this comment.
@soumyarajbag the proposal looks good. Please go ahead and submit it in the GSoC portal.
|
@soumyarajbag You can now work on a proof of concept (PoC) showcasing your coding ability and problem solving skills but making some minimal implementation of the technology you proposed in the project. Please find the guidelines below:
and explore if AI evaluation UI can be built using it to make it easy for end users to run evals from inside AI agents. |
|
surely @animator - I previously implemented a PoC structure - refining it a more as per the guidelines. Thank you. Will update you here in the threads. |
PR Description
Hello mentors @animator @ashitaprasad @DenserMeerkat @synapsecode , I am Soumyaraj Bag, a 4th-year B.Tech student (CSE, RCCIIT) and Full-Stack Engineer. I am submitting my formal proposal for the Multimodal AI and Agent API Eval Framework.
This proposal outlines a 350-hour "Large" project focused on building a production-grade, full-stack evaluation framework as a companion service to API Dash. It leverages my hands-on experience at Trumio (AI Interviewer Agent, FastAPI microservices) and smallcase (production backend engineering), along with a fully deployed proof-of-concept at apidash-eval-poc.vercel.app — covering multimodal evaluation for Text, Image, and Audio modalities across providers (OpenAI, Anthropic, Google), a React/TypeScript dashboard with real-time SSE streaming, and an async FastAPI backend backed by MongoDB Atlas.
My existing APIDash contributions include:
packages/genai(system prompt placement + SSE streaming fix + new provider unit tests)ResponseEvaluatorutility tobetter_networkingwith 25 test casesgenaipackage blocking headless/pure-Dart useRelated Issues
Checklist
Added/updated tests?
We encourage you to add relevant test cases.
OS on which you have developed and tested the feature?