NVlabs · WCJ-BERT · Mar 19, 2026 · Mar 19, 2026 · Mar 19, 2026 · Mar 19, 2026
diff --git a/README.md b/README.md
@@ -57,8 +57,38 @@ appreciated.
 
 ## Getting Started
 
-To run simulations locally (Docker Compose, single machine), see the [Tutorial](docs/TUTORIAL.md).
-For cluster or SLURM deployment, see `src/tools/run-on-slurm`.
+### Quick Start
+
+1. **Setup Environment**
+   Follow the [Onboarding Guide](docs/ONBOARDING.md) to install dependencies and configure your
+   environment.
+
+2. **Prepare Scene Data**
+   AlpaSim uses [trajdata](https://github.com/NVlabs/trajdata/tree/alpasim) (custom `alpasim`
+   branch) for unified data loading. This dependency is automatically installed via `uv`. The wizard
+   automatically prepares scene caches, but you can also do it manually:
+
+   ```bash
+   # For USDZ scenes (after downloading from Hugging Face)
+   uv run python -m alpasim_runtime.prepare_data \
+       --desired-data=usdz \
+       --data-dir=./data/nre-artifacts/all-usdzs \
+       --cache-location=./cache/trajdata_usdz
+   ```
+
+   See the [Tutorial](docs/TUTORIAL.md#data-preparation) for more details on data preparation
+   options.
+
+3. **Run Your First Simulation**
+   ```bash
+   source setup_local_env.sh
+   uv run alpasim_wizard +deploy=local wizard.log_dir=$PWD/my_first_run
+   ```
+
+   Results will be in `my_first_run/` including videos and metrics.
+
+For detailed instructions, see the [Tutorial](docs/TUTORIAL.md). For cluster or SLURM deployment, see
+`src/tools/run-on-slurm`.
 
 ## Documentation & Resources
 

diff --git a/docs/TUTORIAL.md b/docs/TUTORIAL.md
@@ -336,6 +336,91 @@ the predictions of a policy, you can set
 `runtime.simulation_config.physics_update_mode: NONE` and
 `runtime.simulation_config.force_gt_duration_us` to a very high value (20s+).
 
+## Data Preparation
+
+AlpaSim uses [trajdata](https://github.com/NVlabs/trajdata/tree/alpasim) (custom `alpasim` branch)
+for unified data loading across different autonomous driving datasets. The trajdata library is
+automatically installed via `uv` when you run `setup_local_env.sh`.
+
+The wizard automatically prepares the trajdata cache when needed. However, you can manually
+prepare or rebuild the cache for optimization or debugging purposes.
+
+### Preparing USDZ Scene Cache
+
+The wizard automatically handles data preparation for downloaded USDZ scenes. However, if you need to
+manually prepare or rebuild the cache, use the `prepare_data` tool:
+
+```bash
+# Prepare cache from USDZ scenes
+uv run python -m alpasim_runtime.prepare_data \
+    --desired-data=usdz \
+    --data-dir=./data/nre-artifacts/all-usdzs \
+    --cache-location=./cache/trajdata_usdz \
+    --smooth_trajectories=true \
+    --log-level=INFO
+```
+
+This command:
+- Scans USDZ files in the specified directory
+- Extracts trajectory and map data
+- Creates a trajdata cache for fast scene loading
+- The cache is reused across runs to improve startup time
+
+### Using Configuration Files
+
+For more complex setups or repeated use, create a configuration file (e.g.,
+`user_config/config_prepare_usdz.yaml`):
+
+```yaml
+data_source:
+  desired_data: ["usdz"]
+  data_dirs:
+    usdz: "./data/nre-artifacts/all-usdzs"
+  cache_location: "./cache/trajdata_usdz"
+  incl_vector_map: true
+  rebuild_cache: false  # Set to true to force rebuild
+  rebuild_maps: false   # Set to true to force rebuild
+  desired_dt: 0.1       # 10 Hz sampling rate
+  num_workers: 4        # Parallel workers for cache creation
+
+smooth_trajectories: true
+```
+
+Then run:
+
+```bash
+uv run python -m alpasim_runtime.prepare_data \
+    --user-config=user_config/config_prepare_usdz.yaml
+```
+
+### Cache Location
+
+The trajdata cache contains:
+- Preprocessed scene metadata and indices
+- Trajectory data in a unified format
+- Vector map data (when enabled)
+- Scene lookup tables for fast access
+
+You can share the cache directory across machines to avoid redundant preprocessing.
+
+### Rebuilding the Cache
+
+If you encounter data inconsistencies or add new scenes, rebuild the cache:
+
+```bash
+# Command line
+uv run python -m alpasim_runtime.prepare_data \
+    --desired-data=usdz \
+    --data-dir=./data/nre-artifacts/all-usdzs \
+    --cache-location=./cache/trajdata_usdz \
+    --rebuild-cache
+
+# Or in config file, set: rebuild_cache: true
+```
+
+> :green_book: The wizard uses configuration files from `user_config/` that include data source
+> settings. These configs are automatically used during simulation runs.
+
 ## Scenes
 
 The scene in AlpaSim is a NuRec reconstruction of a real-world driving log.
@@ -510,7 +595,17 @@ from shutting down the docker containers after each simulation by setting
 1. (Terminal 2) `cd` into the the runtime src directory (`<repo_root>/src/runtime/`) and prepare to
    start the runtime. The exact command paths will vary, but, to use the configuration generated
    from the earlier steps, an example command would be:
-   `bash     cd <repo_root>/src/runtime/     # Following command is based on the docker-compose.yaml generated by the wizard     uv run python -m alpasim_runtime.simulate \     --usdz-glob=../../data/nre-artifacts/all-usdzs/**/*.usdz \     --user-config=../../tutorial_dbg_runtime/generated-user-config-0.yaml \     --network-config=../../tutorial_dbg_runtime/generated-network-config.yaml \     --log-dir=../../tutorial_dbg_runtime \     --log-level=INFO     `
+   ```bash
+   cd <repo_root>/src/runtime/
+   # Following command is based on the docker-compose.yaml generated by the wizard
+   # Ensure the user config contains the data_source configuration
+   uv run python -m alpasim_runtime.simulate \
+     --user-config=../../tutorial_dbg_runtime/generated-user-config-0.yaml \
+     --network-config=../../tutorial_dbg_runtime/generated-network-config.yaml \
+     --log-dir=../../tutorial_dbg_runtime \
+     --eval-config=../../tutorial_dbg_runtime/eval-config.yaml \
+     --log-level=INFO
+   ```
 
 ### Using VSCode Debugger (Optional)
 
@@ -528,13 +623,16 @@ built-in debugger:
   "justMyCode": false,
   "cwd": "${workspaceFolder}/src/runtime",
   "args": [
-    "--usdz-glob=../../data/nre-artifacts/all-usdzs/**/*.usdz",
     "--user-config=../../tutorial_dbg_runtime/generated-user-config-0.yaml",
     "--network-config=../../tutorial_dbg_runtime/generated-network-config.yaml",
+    "--eval-config=../../tutorial_dbg_runtime/eval-config.yaml",
     "--log-dir=../../tutorial_dbg_runtime",
     "--log-level=INFO"
   ],
-  "console": "integratedTerminal"
+  "console": "integratedTerminal",
+  "env": {
+    "PYTHONPATH": "${workspaceFolder}/src/grpc:${workspaceFolder}/src/eval/src:${workspaceFolder}/src/utils:${workspaceFolder}/src/runtime:${env:PYTHONPATH}"
+  }
 }
 ```
 

diff --git a/src/runtime/alpasim_runtime/config.py b/src/runtime/alpasim_runtime/config.py
@@ -8,7 +8,7 @@
 from dataclasses import dataclass, field
 from enum import Enum
 from pathlib import Path
-from typing import Optional, Type, TypeVar, cast
+from typing import Any, Dict, Optional, Type, TypeVar, cast
 
 from alpasim_utils.scenario import VehicleConfig
 from alpasim_utils.yaml_utils import load_yaml_dict
@@ -17,6 +17,105 @@
 C = TypeVar("C")
 
 
+@dataclass
+class GenericSourceConfig:
+    """Generic configuration for any trajdata-supported dataset.
+
+    This unified config supports all trajdata datasets (USDZ, NuPlan, Waymo,
+    nuScenes, Lyft, Argoverse, etc.) with a flexible extra_params field for
+    dataset-specific options.
+
+    Attributes:
+        enabled: Whether this data source is enabled
+        data_dir: Path to dataset directory
+        desired_dt: Desired time delta between trajectory frames in seconds
+        incl_vector_map: Whether to load vector maps (roads, lanes, etc.)
+        extra_params: Dataset-specific parameters (e.g., NuPlan's config_dir,
+                      USDZ's asset_base_path, etc.)
+
+    Example extra_params:
+        - NuPlan: {"config_dir": "/path", "num_timesteps_before": 30, "num_timesteps_after": 80}
+        - USDZ: {"asset_base_path": "/assets"}
+        - Waymo: {} (no extra params needed)
+    """
+
+    enabled: bool = True
+    data_dir: Optional[str] = None
+    extra_params: Dict[str, Any] = field(default_factory=dict)
+
+
+@dataclass
+class DataSourceConfig:
+    """Configuration for unified data loading through trajdata.
+
+    Supports dynamic registration of any trajdata dataset (USDZ, NuPlan, Waymo,
+    nuScenes, Lyft, Argoverse, etc.) through the 'sources' dictionary.
+
+    Attributes:
+        cache_location: Path to shared trajdata cache directory
+        rebuild_cache: Whether to force rebuild the cache for all sources
+        rebuild_maps: Whether to force rebuild maps for all sources
+        num_workers: Number of parallel workers for cache creation
+        sources: Dictionary mapping dataset names to their configurations
+                 (e.g., {"usdz": GenericSourceConfig(...), "waymo": ...})
+    """
+
+    # Common configuration (applies to all data sources)
+    cache_location: str = MISSING
+    desired_dt: float = 0.1  # 10 Hz sampling
+    incl_vector_map: bool = True
+    rebuild_cache: bool = False
+    rebuild_maps: bool = False
+    num_workers: int = 1  # Conservative default for stability; increase for production
+
+    # New extensible source configuration (preferred)
+    sources: Dict[str, GenericSourceConfig] = field(default_factory=dict)
+
+    def to_trajdata_params(self) -> dict:
+        """Convert hierarchical config to flat parameters for trajdata's UnifiedDataset.
+
+        Returns:
+            Dictionary with keys expected by UnifiedDataset constructor
+
+        Raises:
+            ValueError: If no data sources are enabled
+        """
+        desired_data = []
+        data_dirs = {}
+        dataset_kwargs = {}
+
+        for dataset_name, source in self.sources.items():
+            if source.enabled:
+                if source.data_dir is None:
+                    raise ValueError(
+                        f"data_source.sources.{dataset_name}.data_dir is required when enabled"
+                    )
+                desired_data.append(dataset_name)
+                data_dirs[dataset_name] = source.data_dir
+                if source.extra_params:
+                    dataset_kwargs[dataset_name] = source.extra_params
+
+        if not desired_data:
+            raise ValueError("No data sources enabled in configuration")
+
+        params = {
+            "desired_data": desired_data,
+            "data_dirs": data_dirs,
+            "cache_location": self.cache_location,
+            "incl_vector_map": self.incl_vector_map,
+            "rebuild_cache": self.rebuild_cache,
+            "rebuild_maps": self.rebuild_maps,
+            "num_workers": self.num_workers,
+            "desired_dt": self.desired_dt,
+        }
+
+        # Add dataset-specific kwargs if any source has extra_params
+        if dataset_kwargs:
+            params["dataset_kwargs"] = dataset_kwargs
+
+        return params
+
+
 def typed_parse_config(path: str | Path, config_type: Type[C]) -> C:
     """Reads a yaml file at `path` and parses it into a provided type using omegaconf."""
     yaml_config = OmegaConf.create(load_yaml_dict(path))
@@ -243,16 +342,17 @@ class UserSimulatorConfig:
     endpoints: UserEndpointConfig = MISSING
 
     smooth_trajectories: bool = True  # whether to smooth trajectories with cubic spline
-    # Max worker-local artifact cache size.
-    # None = unlimited, 0 = disable cache and always reload artifacts.
-    artifact_cache_size: Optional[int] = None
     extra_cameras: list[CameraDefinitionConfig] = field(default_factory=list)
 
     # Number of worker processes for parallel rollout execution.
     # 1 = inline mode, all in one process, good for debugging
     # >1 = multi-worker mode with subprocess-based parallelism
     nr_workers: int = MISSING
 
+    # Unified data source configuration (required)
+    # Data loading goes through trajdata's UnifiedDataset
+    data_source: DataSourceConfig = MISSING
+
 
 @dataclass
 class SimulatorConfig:

diff --git a/src/runtime/alpasim_runtime/daemon/__init__.py b/src/runtime/alpasim_runtime/daemon/__init__.py
@@ -2,6 +2,7 @@
 # Copyright (c) 2026 NVIDIA Corporation
 
 from alpasim_runtime.daemon.engine import DaemonEngine
+from alpasim_runtime.daemon.exceptions import InvalidRequestError, UnknownSceneError
 from alpasim_runtime.daemon.request_store import RequestStore
 from alpasim_runtime.daemon.scheduler import DaemonScheduler, DaemonUnavailableError
 
@@ -10,4 +11,6 @@
     "DaemonScheduler",
     "DaemonUnavailableError",
     "RequestStore",
+    "InvalidRequestError",
+    "UnknownSceneError",
 ]