Skip to content

Latest commit

 

History

History
207 lines (168 loc) · 7.91 KB

File metadata and controls

207 lines (168 loc) · 7.91 KB

Deployment Architecture

Overview

This document describes the build and deploy pipeline for Flash applications. It covers what happens when you run flash build and flash deploy, how endpoints are provisioned, and how the manifest ties everything together.

Build Pipeline

flash build

flash build packages your application into a deployable archive:

flash build
    │
    ├── 1. Discovery
    │   ├── Scan .py files for @Endpoint(...) decorators (QB)
    │   ├── Scan for Endpoint(...) variable assignments with routes (LB)
    │   └── Scan for @Endpoint(...) on classes (class-based QB)
    │
    ├── 2. Manifest Generation
    │   ├── Map functions/classes to resource names
    │   ├── Record LB routes (method + path)
    │   ├── Detect cross-endpoint calls (makes_remote_calls flag)
    │   └── Write flash_manifest.json
    │
    ├── 3. Handler Generation
    │   ├── QB functions: generate deployed handler (JSON in/out)
    │   ├── QB classes: generate class handler (singleton instance, method dispatch)
    │   └── LB endpoints: no handler needed (FastAPI server generated by runtime)
    │
    ├── 4. Dependency Installation
    │   ├── Install Python packages for linux/x86_64
    │   ├── Target Python 3.12 for wheel ABI selection
    │   └── Binary wheels only (no compilation)
    │
    └── 5. Packaging
        └── Create .flash/artifact.tar.gz

Discovery

The scanner (cli/commands/build_utils/scanner.py) uses AST analysis to find:

  • @Endpoint(...) on functions: Queue-based endpoints. One function per endpoint.
  • @Endpoint(...) on classes: Class-based queue-based endpoints. The class is instantiated once per worker (singleton), and methods are dispatched per request.
  • Endpoint(...) variable assignments: Load-balanced endpoints. Routes are registered via @ep.post("/path"), @ep.get("/path"), etc.

Manifest Structure

{
    "functions": [
        {
            "name": "process",
            "module_path": "gpu_worker",
            "resource_name": "gpu-worker",
            "is_load_balanced": false,
            "is_class": false,
            "dependencies": ["torch"],
            "makes_remote_calls": false
        },
        {
            "name": "MyModel",
            "module_path": "model_worker",
            "resource_name": "model-worker",
            "is_load_balanced": false,
            "is_class": true,
            "class_methods": ["predict", "embed"],
            "dependencies": ["torch", "transformers"]
        },
        {
            "name": "api",
            "module_path": "api_server",
            "resource_name": "api-server",
            "is_load_balanced": true,
            "is_class": false,
            "routes": [
                {"method": "POST", "path": "/predict", "handler": "predict"},
                {"method": "GET", "path": "/health", "handler": "health"}
            ]
        }
    ],
    "resources": [
        {"name": "gpu-worker", "is_load_balanced": false},
        {"name": "model-worker", "is_load_balanced": false},
        {"name": "api-server", "is_load_balanced": true}
    ]
}

Handler Generation

The handler generator (cli/commands/build_utils/handler_generator.py) produces different handlers based on endpoint type:

Function QB handler -- wraps a function for Runpod's serverless protocol:

# generated handler for QB function endpoints
from module_path import function_name

def handler(job):
    job_input = job["input"]
    result = function_name(job_input)
    return result

Class QB handler -- instantiates class once, dispatches to methods:

# generated handler for class-based QB endpoints
from module_path import ClassName

_instance = ClassName()
_METHODS = {"predict": _instance.predict, "embed": _instance.embed}

def handler(job):
    job_input = job["input"]
    # single-method classes auto-dispatch
    # multi-method classes require "method" key in input
    method_name = job_input.pop("method", None)
    method = _METHODS[method_name]
    return method(**job_input)

LB endpoints do not need generated handlers. The LB runtime image starts a FastAPI server that loads routes from the manifest.

Deploy Pipeline

flash deploy

flash deploy runs the build pipeline, then provisions endpoints:

flash deploy --env production
    │
    ├── 1. Build (same as flash build)
    │
    ├── 2. Upload
    │   └── Upload artifact.tar.gz to Runpod storage (R2)
    │
    ├── 3. Provision Endpoints
    │   ├── For each resource in manifest:
    │   │   ├── Check if endpoint exists (by name in environment)
    │   │   ├── If new: create endpoint via GraphQL API
    │   │   ├── If exists + config drift: update endpoint
    │   │   └── If exists + no drift: skip
    │   └── Set env vars on each endpoint (explicit env={} + system vars like RUNPOD_API_KEY)
    │
    ├── 4. Register with State Manager
    │   └── Store endpoint IDs for cross-endpoint routing
    │
    └── 5. Post-Deploy
        ├── Display endpoint URLs
        └── Show available routes

Resource Class Selection

Endpoint._build_resource_config() selects the appropriate internal resource class for provisioning:

Usage Pattern GPU CPU Internal Class
@Endpoint(...) on function yes -- LiveServerless
@Endpoint(...) on function -- yes CpuLiveServerless
@Endpoint(...) on class yes -- LiveServerless
ep = Endpoint(...) + routes yes -- LiveLoadBalancer
ep = Endpoint(...) + routes -- yes CpuLiveLoadBalancer
Endpoint(image=...) yes -- ServerlessEndpoint
Endpoint(image=...) -- yes CpuServerlessEndpoint

Docker Images

Each resource class maps to a specific Docker image:

Internal Class Image Base
LiveServerless runpod/worker-flash:latest PyTorch + CUDA
CpuLiveServerless runpod/worker-flash-cpu:latest Python slim
LiveLoadBalancer runpod/worker-flash-lb:latest PyTorch + FastAPI
CpuLiveLoadBalancer runpod/worker-flash-lb-cpu:latest Python slim + FastAPI

Config Drift Detection

When deploying to an environment that already has endpoints, Flash compares the current configuration hash against the stored hash. If they differ, the endpoint is updated. See Resource Config Drift Detection for details.

Cross-Endpoint Routing at Deploy Time

When flash deploy provisions endpoints:

  1. Endpoints with makes_remote_calls=True get RUNPOD_API_KEY injected automatically
  2. Each endpoint gets the flash_manifest.json included in its artifact
  3. The State Manager stores {environment_id, resource_name} -> endpoint_id
  4. At runtime, the ServiceRegistry uses the manifest + State Manager to route calls

Manifest credential handling

  • Runtime endpoint metadata (including API-returned aiKey) may be stored in the State Manager manifest for deployment reconciliation.
  • Local .flash/flash_manifest.json is sanitized before it is written to disk and does not include aiKey.
  • RUNPOD_API_KEY is sourced from environment/credential storage and injected into endpoint env when needed; it is not persisted in the local manifest.

See Cross-Endpoint Routing for the full runtime flow.

Related Documentation