Skip to content

MatPiech/edge-tta

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Benchmarking the Cost of Adaptation: A System-Level Analysis of Test-Time Adaptation on Edge Devices

The widespread use of deep neural networks on edge devices has revealed a significant vulnerability related to performance degradation caused by distribution shifts. Static models fail when the statistical properties of the test data differ from those of the training data due to environmental changes, sensor noise, or domain shift. Test-time adaptation (TTA) is an effective algorithmic solution that allows models to continuously adjust their parameters during inference using unlabeled data streams. However, the existing literature on TTA primarily focuses on improving functional metrics like accuracy and often overlooks the severe resource constraints present in edge computing. This study addresses these gaps by providing a thorough, system-level analysis of TTA that measures hidden costs in terms of memory footprint, computational latency, and energy consumption.

Purpose

edge-tta evaluates whether online TTA methods are practical outside datacenter settings, where latency, memory, and energy are as important as robustness.

The repository focuses on:

  • comparing adaptation methods under distribution shift (corruptions)
  • measuring end-to-end runtime cost on edge devices (Raspberry Pi 5, Jetson Orin)
  • exposing trade-offs between accuracy gains and system overhead

Methodology

Benchmarked methods

The benchmark includes representative online TTA baselines and adaptive methods implemented in edge_tta/methods/, including:

Evaluation protocol

The main evaluation loop is implemented in main.py and runs each method over multiple corruption types (noise, blur, weather, and digital corruptions).

For each corruption, the pipeline:

  1. loads the corrupted dataset split (CIFAR-10-C, CIFAR-100-C, or ImageNet-C)
  2. performs online adaptation/inference
  3. records task metrics: Top-1/Top-5 accuracy and ECE
  4. records system metrics with PerformanceMonitor
  5. resets adaptation state before the next corruption

System metrics collected

The monitoring stack tracks:

  • timing: batch wall time, forward time, adaptation time, throughput
  • memory: CPU memory, GPU memory (where available), activation memory, peak RAM
  • energy: total energy and energy per sample with device-aware backends

Usage

1. Install

pip install -e .[dev]

2. Prepare datasets

Use helper scripts in scripts/ (get_cifar10c.sh, get_imagenetc.sh) and place data under data/.

3. Run a single method

python main.py \
	--architecture resnet18 \
	--checkpoint-path ./checkpoints/resnet18_cifar10_source_model.pth \
	--data-dir ./data/CIFAR-10-C \
	--batch-size 4 \
	--tta-algorithm tent \
	--level 5 \
	--output ./results/outputs_resnet18_cifar10c_bs4 \
	--track-performance true

4. Run multiple methods

bash scripts/run_all_methods.sh \
	--architecture resnet18 \
	--checkpoint-path ./checkpoints/resnet18_cifar10_source_model.pth \
	--data-dir ./data/CIFAR-10-C \
	--batch-size 4

Results

Summary of system-level results for TTA using a ResNet-50 model, evaluated on selected MPU-class edge devices with the ImageNet-C benchmark. Delta Acc and Delta ECE are reported relative to the No Adaptation baseline. ηAE denotes adaptation efficiency. Latency and energy are reported per batch.

Device Raspberry Pi 5 (Batch Size 16) NVIDIA Jetson Orin Nano (Batch Size 32)
Method / Metrics Delta Acc (%) Delta ECE (%) Lat. (ms) Energy (J) Peak RAM (MB) ηAE Delta Acc (%) Delta ECE (%) Lat. (ms) Energy (J) Peak RAM (MB) ηAE
No Adaptation0.000.003660.7425.59991.920.00.000.00303.615.412775.620.0
AdaBN-6.2374.103859.4526.961030.55-72.762.2472.94408.666.592782.8360.75
T3A-2.88-48.033958.7927.681367.27-22.05-3.18-47.91930.0514.953338.02-10.67
LAME-0.97-9.053795.2526.511121.89-16.87-1.30-12.58404.576.553011.36-36.49
Pseudo Label-0.5675.114217.3329.472507.98-2.316.9269.21514.808.405526.6374.06
SHOT-IM8.73-24.058654.8460.532646.094.0018.25-16.49991.2715.055493.3460.58
TENT9.41-116.658616.8360.122985.814.3623.56-62.86843.7913.045470.4198.81
CoTTAOut-of-MemoryOut-of-Memory
EATA15.41-137.794602.0332.182495.8437.4133.10-105.781147.7417.565597.5487.18
SAR10.5038.526904.6048.302767.847.4022.2240.831780.9110.285528.69146.00

ResNet50 accuracy batch latency Pareto

ResNet50 accuracy batch latency Pareto

ResNet50 batch-16 execution time breakdown stacked per sample

ResNet50 batch-16 execution time breakdown stacked per sample

Memory composition analysis for ResNet50 on Orin GPU across all methods

Memory composition analysis for ResNet50 on Orin GPU across all methods

Models memory wall peak VRAM usage

Models memory wall peak VRAM usage

About

A System-Level Analysis of Test-Time Adaptation on Edge Devices

Resources

Stars

Watchers

Forks

Releases

No releases published

Contributors