Built with β€οΈ at the Gemini 3 Hackathon in Paris π«π·
Croissant Toolkit is an intelligent suite of tools designed to work seamlessly with the MLCommons Croissant metadata format for machine learning datasets. Powered by the incredible capabilities of Google Gemini 3, this toolkit simplifies, validates, and enhances ML dataset preparation, discovery, and exploration.
- π§ Wizard Data Integrator: A single command to rule them all. Orchestrates transcription, translation, and NLP to generate fully-enriched Croissant metadata.
- π€ Intelligent Metadata Generation: Automatically generate and enrich Croissant
.jsonldmetadata from raw dataset files using Gemini 3's advanced multimodal reasoning. - π₯ Croissant Expert Logic: Deep integration with the MLCommons Croissant specification for 100% compliant JSON-LD serialization.
- π Automated Browser Navigation: Seamlessly launch Google Chrome and perform Google searches directly from the toolkit.
- π₯ YouTube Video Discovery: Search and extract structured video data (titles, descriptions, URLs).
- π Automated Transcription: Fetch and store full text transcripts from YouTube videos.
- π Intelligent Translation: Automatically recognize source languages and translate video scripts or dataset documents precisely into English using Gemini 3.
- π§ Multilingual NLP: Detect people, organizations, dates, AI models, and currency. Preserves original non-English names in metadata.
- π§ Communication Officer: Securely deliver generated datasets and reports to stakeholders via email.
- π Obsidian Expert: Automatically transform Croissant metadata into rich Markdown notes.
- πΈοΈ Neo4j Expert: Ingest Croissant datasets into a Neo4j Graph Database for relational discovery and semantic search.
- πΆ Walker Expert: Extract and explore internal links from a page when deep research is required.
- πΈ Photograph: Automatically capture high-quality snapshots of web pages and record screen activity during process execution.
- π Semantic Dataset Search: Search through local and remote datasets using natural language queries.
- β Format Validation: Ensure your metadata files are 100% compliant with the MLCommons Croissant specification.
- π¬ Dataset Q&A: Ask questions directly about your datasets, getting instant insights from descriptions, structures, and schemas.
- π΅οΈ Fact Checker: High-fidelity AI analysis of sensitive claims, legal conflicts, and innovation impacts with visual passage highlighting and video evidence.
- π Claims Detection: Automated extraction of verifiable factual statements with MD5-based unique IDs and Schema.org semantic mapping.
- π·οΈ CDIF Maker: Produce structured CDIF-dedicated semantic inventories from natural language for precise variable mapping.
- π¦ RO-Crate Expert: Package Dataverse research objects into FAIR-compliant RO-Crate metadata with OOYDID provenance.
- π Presentation Expert: Automatically transform datasets and research insights into high-impact Markdown-based slide decks.
- π¨ Creator (Cinematic Video): Generate premium, high-tech MP4/AVI videos and animated intros from toolkit data.
- πΈοΈ ROHub: Deposit research objects and add rich semantic annotations (triples) to the RO-Hub portal following FAIR2Adapt profiles.
- ποΈ The Visual Systems Architect: Translate complex technical requirements and data flows into clear, intuitive Mermaid.js diagrams.
- π§ TRIZ Expert: Inventive problem-solving framework to resolve technical contradictions and optimize solutions (ODRL Protected).
- βΎοΈ UNF Skill: Universal Numeric Fingerprint (UNF) generator for creating system-independent hashes for data strings and tables.
The Navigator skill acts as an intelligent bridge between the toolkit and the web. It allows the agent to:
- Search: Execute high-intent searches on Google.
- Browse: Open the specific search results in Google Chrome for the user.
- Analyze: Extract and structure metadata (titles, snippets, keywords) from the web to feed into the Croissant reasoning engine.
The Youtuber skill expands the toolkit's reach to video content. It allows the agent to:
- Discovery: Find relevant video tutorials, explanations, and demonstrations on YouTube.
- Structured Extraction: Parse YouTube's internal data to get clean titles, URLs, and video descriptions.
- Multi-Modal Context: Use video descriptions to provide richer context for dataset understanding.
The Transcriber skill converts video content into a machine-readable format. It:
- Downloads: Fetches accurate closed-caption data from YouTube.
- Stores: Organizes transcripts in
./data/transcripts/for downstream processing. - Synthesizes: Enables Gemini 3 to "understand" video content by reasoning over the full spoken text.
The Wizard is the master orchestrator of the toolkit. It provides a single entry point for complex data tasks:
- Automation: Chaining transcription, translation, and NLP enrichment.
- Multilingual Support: Captures entities in both English and their original language (e.g., Ukrainian, Russian, French) using JSON-LD tags.
- Synthesis: Combines factual measurements and AI-driven fact-checking into a single investigative report.
- End-to-End: Goes from a raw link to a finalized Croissant metadata file with automated verification and secure vaulting.
The Translator skill ensures the toolkit is truly global. It:
- Detection: Automatically identifies the source language of any text or video script.
- Precision: Translates content precisely into English using Gemini 1.5 Flash.
- Persistence: Saves translated versions alongside originals for easy integration.
The NLP Expert skill extracts structured knowledge. It:
- Recognition: Identifies persons, organizations, locations, and dates.
- Semantic Mapping: Converts detected entities into Schema.org JSON-LD.
- Contextual Enrichment: Provides deeper understanding of dataset provenance and coverage.
The Croissant Expert skill is the brains behind the metadata formatting. It:
- Spec Compliance: Reads and interprets the official MLCommons Croissant specification.
- Serialization: Transforms dataset high-level metadata into standardized JSON-LD.
- Organization: Automatically stores output files in
./data/croissant/. - Extensible Design: Support for
FileObject,FileSet, and complexRecordSetmappings.
The Communication Officer skill handles the delivery of results via email. It:
- Secure Delivery: Sends emails using SMTP with TLS.
- Smart Attachments: Automatically attaches generated Croissant JSON-LD files.
The Telegram Expert skill provides instant messaging notifications. It:
- Instant Alerts: Sends status updates and summaries to a Telegram chat or channel.
- File Delivery: Uploads generated
.jsonldfiles directly to Telegram. - Bot API: Uses the standard Telegram Bot API for reliable communication.
The Neo4j Expert translates Croissant files into a Knowledge Graph. It:
- Ingestion: Standardizes JSON-LD into Graph nodes (Dataset, Creator, Location).
- Semantic Querying: Allows natural language queries that are translated into Cypher via Gemini 3.
The Walker skill performs deep web exploration. It:
- Deep Crawl: Extracts all internal links from a specified URL.
- Autonomous Navigation: Can be triggered to visit all discovered pages if initial information is insufficient.
The ODRL Expert provides a decentralized security layer. It:
- Identity: Manages your master identity via DIDs (Decentralized Identifiers).
- Rights Management: Enforces ODRL policies to restrict or permit skill usage via the CODATA OAC Profile.
- Skill Vault: Encrypts skill source code using your private key to ensure physical security.
- Branding: Automatically applies CODATA/Croissant branding and provenance metadata to generated artifacts.
The Fact Checker performing deep investigative analysis on a document or URL. It:
- Sensitive Discovery: Detects high-stakes claims related to terminations, legal conflicts, and corporate sensitive data.
- Innovation Summary: Synthesizes the cognitive impact and innovative potential of the analyzed content.
- Visual Evidence: Generates highlight-annotated video recordings of its reasoning process.
The Claims Detection skill transforms documents into structured factual datasets. It:
- Granular Extraction: Isolates core factual claims and their original context sentences.
- Verifiable IDs: Assigns unique MD5 hashes to every claim for reproducible identification.
- Standardized Export: Generates
data/claims.jsonwith full DID attribution and truth probability scoring.
The CDIF Maker skill connects to semantic services for variable mapping. It:
- Discovery: Resolves natural language into structured technical metadata.
- Inventory: Produces standardized variable definitions and symbols.
The RO-Crate Expert skill bridges repositories with the RO-Crate standard. It:
- Integration: Seamlessly pulls data and metadata from Dataverse DOIs.
- Packaging: Creates FAIR-compliant ZIP archives with embedded DID documents.
The Creator skill produces cinematic visual assets. It:
- Media Rendering: Generates MP4 videos from text, slides, and screenshots.
- AI Aesthetics: Applies premium futuristic styling to all toolkit demos.
The Presentation Expert handles storytelling. It:
- Narrative Design: Drafts technical pitch decks using Gemini 3.
- Standard Serialization: Outputs compliant Marp/Markdown slides.
The TRIZ skill handles innovation and optimization. It:
- Contradiction Analysis: Identifies technological or physical contradictions in a system.
- Inventive Proposals: Suggests improvements using the 40 Inventive Principles and Ideal Final Result (IFR) concepts.
The UNF skill handles data integrity. It:
- Universal Hashing: Computes format-agnostic fingerprints (UNF v6) for datasets and strings.
- Canonical Parity: Aligns with the Dataverse specification for reproducible research identifiers.
The Architect skill handles complex infrastructure design. It:
- Analyze & Design: Translates data flows and app ideas into scalable architectural patterns.
- Visualize: Generates render-ready Mermaid.js code for system interactions and VPC boundaries.
- Gemini 3 API: For LLM-driven metadata generation, reasoning, and semantic search.
- MLCommons Croissant: The standard format for ML dataset metadata.
- Python 3.10+: Core backend logic and tooling.
- DuckDuckGo HTML Engine: For robust, non-JS result scraping.
- YouTube Data Parser: Custom scraper for YouTube's initial metadata.
- YouTube Transcript API: For secure retrieval of video caption text.
- AI Translation & NLP: High-precision multi-lingual support and entity extraction via Gemini 3.
- GitHub Deployment Guide: How to package and push the toolkit with ODRL security.
- Python 3.10 or higher
- A Gemini API Key from Google AI Studio
- Google Chrome installed (for
NavigatorandYoutuberskills)
-
Clone the repository:
git clone https://github.com/codata/croissant-toolkit.git cd croissant-toolkit -
Set up a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt
-
Configure your environment:
# Required for AI Reasoning export GEMINI_API_KEY="your-api-key-here" # Optional: Config for Communication Officer (Email) export SMTP_USER="your-email@gmail.com" export SMTP_PASS="your-google-app-password" # Optional: Config for Telegram Expert export TELEGRAM_BOT_TOKEN="your-bot-token" export TELEGRAM_CHAT_ID="your-chat-or-channel-id"
Using the Navigator Skill:
# Search for datasets and extract metadata to google_search_results.json
# Search and extract structured web data
python .gemini/skills/navigator/scripts/navigate.py "MLCommons Croissant"Using the Youtuber Skill:
# Search and extract structured video data
python .gemini/skills/youtuber/scripts/search_youtube.py "NLP data engineering"Using the Transcriber Skill:
# Fetch transcripts for a specific video
python .gemini/skills/transcriber/scripts/transcribe.py 6cWcZ2G53gEUsing the Translator Skill:
# Translate a specific transcript file
python .gemini/skills/translator/scripts/translate.py data/transcripts/VIDEO_ID.txtUsing the NLP Expert Skill:
# Extract named entities into JSON-LD from text or files
python .gemini/skills/nlp_expert/scripts/extract_entities.py "Sergei Bodrov was born in Moscow."Using the Croissant Expert Skill:
# Serialize dataset metadata with Intelligent NLP enrichment
# (Detects creators, locations, and dates automatically)
python .gemini/skills/croissant_expert/scripts/serialize.py metadata.json --nlpUsing the Wizard Skill (End-to-End):
# Process a video, enrich metadata, email, and save to Obsidian
export SMTP_USER="your@email.com"
export SMTP_PASS="your-password"
export OBSIDIAN_VAULT_PATH="/path/to/my/vault"
python3 .gemini/skills/wizard/scripts/wizard.py "https://youtube.com/link" "My Dataset" "recipient@example.com"Using the Communication Officer Skill:
# Send a file manually
python3 .gemini/skills/communication_officer/scripts/send_email.py "user@example.com" "Subject" "Body" "path/to/file.jsonld"Using the Telegram Expert Skill:
# Send a notification manually
python3 .gemini/skills/telegram_expert/scripts/send_telegram.py "Notification message"
# Send with an attachment
python3 .gemini/skills/telegram_expert/scripts/send_telegram.py "Check this file" "./data/croissant/dataset.jsonld"Using the Obsidian Expert Skill:
# Convert a Croissant file to a beautiful Obsidian note
python3 .gemini/skills/obsidian_expert/scripts/to_obsidian.py "./data/croissant/dataset.jsonld"Using the Neo4j Expert Skill:
# Ingest into Neo4j
export NEO4J_PASSWORD="your-password"
python3 skills/neo4j_expert/scripts/ingest.py "./data/croissant/dataset.jsonld"
# Query via Natural Language
python3 skills/neo4j_expert/scripts/query.py "Which datasets were created in France?"Using the Walker Skill:
# Extract and visit links for deep research
python3 skills/walker/scripts/walk.py "https://example.com" --limit 5 --navigateUsing the Photograph Skill:
# Take a standard screenshot
python3 .gemini/skills/photograph/scripts/take_screenshot.py "https://www.google.com"
# Take a full-page screenshot
python3 .gemini/skills/photograph/scripts/take_screenshot.py "https://mlcommons.org/croissant" --full_page
# Record the screen while running a process (e.g. searching YouTube)
python3 .gemini/skills/photograph/scripts/record_screen.py --command "python3 .gemini/skills/youtuber/scripts/youtube_search.py MLCommons"
# Take a YouTube snapshot (automatically waits for ads)
python3 .gemini/skills/youtuber/scripts/video_snapshot.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ"Using the RO-Crate Expert Skill:
# Package a Dataverse dataset with DID provenance
python3 .gemini/skills/ro-crate-expert/scripts/create_crate.py "doi:10.70122/FK2/TTSEXH" --zipUsing the Presentation Expert & Creator Skills:
# 1. Generate slides from a dataset
python3 .gemini/skills/presentation_expert/scripts/generate_slides.py "My Dataset"
# 2. Convert slides to a cinematic video
python3 .gemini/skills/creator/scripts/markdown_to_video.py .gemini/skills/presentation_expert/output/slides/my_dataset.mdUsing the CDIF Maker Skill:
# Discover variables and mapped definitions
python3 .gemini/skills/cdif-maker/scripts/cdif_maker.py "air quality index"Using the Architect Skill:
# Generate an architecture diagram from a description
python3 .gemini/skills/architect/scripts/architect.py "A scalable e-commerce checkout system"Metadata Generation:
# Example: Generate a Croissant metadata file from a raw dataset directory
python main.py generate ./my-local-datasetThe toolkit uses the CODATA ODRL Infrastructure for decentralized access control.
Initialize your security wallet:
python3 .gemini/skills/odrl_expert/scripts/odrl_client.py initEncrypt/Vault a skill:
python3 .gemini/skills/odrl_expert/scripts/odrl_client.py vault-skill "fact-checker"Decrypt/Restore a skill:
python3 .gemini/skills/odrl-expert/scripts/odrl_client.py unvault-skill "fact-checker"This toolkit includes a comprehensive suite of automated tests to ensure all skills are functioning correctly.
You can execute the entire test suite using the test-all skill runner:
python3 .gemini/skills/test-all/scripts/test_all.pyThis script will discover and execute all test_*.py files within the .gemini/skills/ directory and provide a summary of the results.
The Gemini CLI is the primary way to interact with the toolkit's intelligent agents. It allows you to run skills using natural language or direct commands.
Install Gemini CLI:
npm install -g @google/gemini-cliExecute tests via CLI: You can ask the agent to run tests for you:
gemini "Use the test-all skill to verify the system integrity"Or target a specific skill:
gemini "Test the Navigator skill and ensure it returns results"A Chrome extension that converts any web page into Croissant JSON-LD metadata, with a contextual chat powered by Gemini 3.
1. Start the backend API:
cd api
pip install -r requirements.txt
export GEMINI_API_KEY="your-api-key-here"
python server.pyThe API runs on http://localhost:8000.
2. Load the extension in Chrome:
- Open
chrome://extensions/ - Enable Developer mode (toggle top-right)
- Click Load unpacked
- Select the
extension/folder
3. Use it:
- Navigate to any website
- Click the croissant icon in the toolbar β the side panel opens
- Click Generate Croissant β extracts the page content and generates a Croissant JSON-LD
- Copy or download the JSON-LD
- Use the Chat section to ask questions about the page based on the generated metadata
Managing ML dataset metadata is traditionally a tedious and manual process. While the Croissant format introduces a powerful and elegant standard, creating compliant metadata files from scratch remains a bottleneck for data scientists.
For the Gemini 3 Hackathon in Paris, we recognized an opportunity to leverage Gemini 3's unmatched contextual understanding and long context window to completely automate this pipeline. Our goal is to bring joy back to data engineering, accelerate the open-data ecosystem, and make data more discoverable and interoperable for everyone.
- Project Structure & Individual Skills: Detailed breakdown of every skill in the toolkit.
- π Skill Cookbook: Learn how to combine multiple skills into powerful, autonomous workflows.
- Testing Guide: How to run the automated integrity suite.
For detailed documentation on each skill, please refer to the docs/ directory:
- Communication Officer
- Telegram Expert
- Croissant Expert
- NLP Expert
- Navigator
- YouTuber
- Photograph
- Fact Checker
- Claims Detection (Data Expert)
- CDIF Maker
- RO-Crate Expert
- Creator
- Presentation Expert
- Submodule Usage (Reusing Skills)
- Gemini CLI Skill Loading
This project is licensed under the MIT License.