Benchmarking the Cost of Adaptation: A System-Level Analysis of Test-Time Adaptation on Edge Devices
The widespread use of deep neural networks on edge devices has revealed a significant vulnerability related to performance degradation caused by distribution shifts. Static models fail when the statistical properties of the test data differ from those of the training data due to environmental changes, sensor noise, or domain shift. Test-time adaptation (TTA) is an effective algorithmic solution that allows models to continuously adjust their parameters during inference using unlabeled data streams. However, the existing literature on TTA primarily focuses on improving functional metrics like accuracy and often overlooks the severe resource constraints present in edge computing. This study addresses these gaps by providing a thorough, system-level analysis of TTA that measures hidden costs in terms of memory footprint, computational latency, and energy consumption.
edge-tta evaluates whether online TTA methods are practical outside datacenter settings, where latency, memory, and energy are as important as robustness.
The repository focuses on:
- comparing adaptation methods under distribution shift (corruptions)
- measuring end-to-end runtime cost on edge devices (Raspberry Pi 5, Jetson Orin)
- exposing trade-offs between accuracy gains and system overhead
The benchmark includes representative online TTA baselines and adaptive methods implemented in edge_tta/methods/, including:
no_adapt- No Adaptation baselineadabn- Revisiting Batch Normalization For Practical Domain Adaptationt3a- Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalizationlame- Parameter-Free Online Test-Time Adaptationpl(pseudo-label) - Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networksshot- Do We Really Need to Access the Source Data? Source Hypothesis Transfer for Unsupervised Domain Adaptationtent- Tent: Fully Test-Time Adaptation by Entropy Minimizationeata- Efficient Test-Time Model Adaptation without Forgettingsar- Towards Stable Test-Time Adaptation in Dynamic Wild Worldcotta- Continual Test-Time Domain Adaptation
The main evaluation loop is implemented in main.py and runs each method over multiple corruption types (noise, blur, weather, and digital corruptions).
For each corruption, the pipeline:
- loads the corrupted dataset split (
CIFAR-10-C,CIFAR-100-C, orImageNet-C) - performs online adaptation/inference
- records task metrics: Top-1/Top-5 accuracy and ECE
- records system metrics with
PerformanceMonitor - resets adaptation state before the next corruption
The monitoring stack tracks:
- timing: batch wall time, forward time, adaptation time, throughput
- memory: CPU memory, GPU memory (where available), activation memory, peak RAM
- energy: total energy and energy per sample with device-aware backends
pip install -e .[dev]Use helper scripts in scripts/ (get_cifar10c.sh, get_imagenetc.sh) and place data under data/.
python main.py \
--architecture resnet18 \
--checkpoint-path ./checkpoints/resnet18_cifar10_source_model.pth \
--data-dir ./data/CIFAR-10-C \
--batch-size 4 \
--tta-algorithm tent \
--level 5 \
--output ./results/outputs_resnet18_cifar10c_bs4 \
--track-performance truebash scripts/run_all_methods.sh \
--architecture resnet18 \
--checkpoint-path ./checkpoints/resnet18_cifar10_source_model.pth \
--data-dir ./data/CIFAR-10-C \
--batch-size 4Summary of system-level results for TTA using a ResNet-50 model, evaluated on selected MPU-class edge devices with the ImageNet-C benchmark. Delta Acc and Delta ECE are reported relative to the No Adaptation baseline. ηAE denotes adaptation efficiency. Latency and energy are reported per batch.
| Device | Raspberry Pi 5 (Batch Size 16) | NVIDIA Jetson Orin Nano (Batch Size 32) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Method / Metrics | Delta Acc (%) | Delta ECE (%) | Lat. (ms) | Energy (J) | Peak RAM (MB) | ηAE | Delta Acc (%) | Delta ECE (%) | Lat. (ms) | Energy (J) | Peak RAM (MB) | ηAE |
| No Adaptation | 0.00 | 0.00 | 3660.74 | 25.59 | 991.92 | 0.0 | 0.00 | 0.00 | 303.61 | 5.41 | 2775.62 | 0.0 |
| AdaBN | -6.23 | 74.10 | 3859.45 | 26.96 | 1030.55 | -72.76 | 2.24 | 72.94 | 408.66 | 6.59 | 2782.83 | 60.75 |
| T3A | -2.88 | -48.03 | 3958.79 | 27.68 | 1367.27 | -22.05 | -3.18 | -47.91 | 930.05 | 14.95 | 3338.02 | -10.67 |
| LAME | -0.97 | -9.05 | 3795.25 | 26.51 | 1121.89 | -16.87 | -1.30 | -12.58 | 404.57 | 6.55 | 3011.36 | -36.49 |
| Pseudo Label | -0.56 | 75.11 | 4217.33 | 29.47 | 2507.98 | -2.31 | 6.92 | 69.21 | 514.80 | 8.40 | 5526.63 | 74.06 |
| SHOT-IM | 8.73 | -24.05 | 8654.84 | 60.53 | 2646.09 | 4.00 | 18.25 | -16.49 | 991.27 | 15.05 | 5493.34 | 60.58 |
| TENT | 9.41 | -116.65 | 8616.83 | 60.12 | 2985.81 | 4.36 | 23.56 | -62.86 | 843.79 | 13.04 | 5470.41 | 98.81 |
| CoTTA | Out-of-Memory | Out-of-Memory | ||||||||||
| EATA | 15.41 | -137.79 | 4602.03 | 32.18 | 2495.84 | 37.41 | 33.10 | -105.78 | 1147.74 | 17.56 | 5597.54 | 87.18 |
| SAR | 10.50 | 38.52 | 6904.60 | 48.30 | 2767.84 | 7.40 | 22.22 | 40.83 | 1780.91 | 10.28 | 5528.69 | 146.00 |



