roi_v2: ROI preprocessing + bundled SVT-AV1 v2.3.0 by EthanYangTW · Pull Request #42 · commaai/comma_video_compression_challenge

EthanYangTW · 2026-04-07T23:26:02Z

submission name

roi_v2

upload zipped `archive.zip`

report.txt

=== Evaluation results over 600 samples ===
  Average PoseNet Distortion: 0.07084085
  Average SegNet Distortion: 0.00508657
  Submission file size: 896,108 bytes
  Original uncompressed size: 37,545,489 bytes
  Compression Rate: 0.02386000
  Final score: 100*segnet_dist + √(10*posenet_dist) + 25*rate = 1.947 (estimated)

does your submission require gpu for evaluation (inflation)?

no

did you include the compression script? and want it to be merged?

yes

additional comments

SVT-AV1 v2.3.0 bundled (lib/libSvtAv1Enc.so.2.3.0) with a custom ffmpeg-new binary.
compress.sh sets LD_LIBRARY_PATH to use the bundled v2.3.0 library on CI.
Same ROI-aware preprocessing as PR #31 (denoise outside driving corridor, blend=0.50, feather=24).
Frames downscaled 45% Lanczos before encoding, decoded with Lanczos upscale + 9-tap binomial USM at 40%.

Novel techniques for sub-2.0 scoring: - Encode at exact model input resolution (512x384) - zero wasted pixels - SVT-AV1 codec (~30% more efficient than H.265) - Full temporal compression with GOP 64 (baseline used all-keyframes) - Edge-preserving nlmeans denoising pre-filter - Lanczos upsampling + subtle unsharp mask during inflate - Variance-based adaptive quantization to protect semantic edges https://claude.ai/code/session_01J1SkNwMqYEUc7KBrn4xcsx

… mismatch https://claude.ai/code/session_01J1SkNwMqYEUc7KBrn4xcsx

Key innovations over baseline (4.39) and previous 2.20: - 50% lanczos downscale (vs 45%) - SVT-AV1 preset 0 with CRF 34 - enable-qm=1:qm-min=0 (novel: adaptive quantization matrices) - film-grain=22 denoise for compression efficiency - GOP 240 for temporal compression Tested locally: score 2.05 PoseNet: 0.07076 | SegNet: 0.00576 | Rate: 0.02514 https://claude.ai/code/session_01J1SkNwMqYEUc7KBrn4xcsx

Key changes from previous config: - 50% lanczos downscale (was nlmeans + 512x384) - CRF 34 (was 30), GOP 240 (was 64) - Enable quantization matrices (enable-qm=1:qm-min=0) - biggest quality win - Film grain synthesis (fg=22, denoise=1) for better temporal tracking - Simplified inflate.py: plain bicubic upscale, removed unused unsharp mask - .mkv container (was .ivf) Scored 2.05 in cloud eval (vs 2.20 baseline)

45% scale + CRF 32 scores 2.23 on MPS (vs 2.30 for 50% CRF 34). Based on the ~0.25 MPS-to-CI offset observed, this should score ~1.95-2.0 on CI.

Tested locally with real model weights and test video: - Preset 4 + no sharpen: 2.20 - Preset 4 + sharpen: 2.18 - Preset 0 + no sharpen: 2.05 - Preset 0 + sharpen: ~2.03 (estimated) Novel techniques: - enable-qm=1:qm-min=0 (quantization matrices) - Laplacian sharpening (strength=0.20) during inflate https://claude.ai/code/session_01J1SkNwMqYEUc7KBrn4xcsx

Tested preset 0 + sharpen = 2.04 (vs 2.05 without, vs 2.20 original baseline) Tested sweep: 3x3 Laplacian > 5x5 LoG > luma-only > Gaussian blur https://claude.ai/code/session_01J1SkNwMqYEUc7KBrn4xcsx

Bundle SVT-AV1 v2.3.0 library (libSvtAv1Enc.so.2.3.0) with a custom ffmpeg binary, using LD_LIBRARY_PATH to ensure v2.3.0 is used on CI instead of the system's newer version. Achieves estimated CI score ~1.947, beating PR commaai#31's 1.95 (same ROI preprocessing + encode params). Local evaluation: PoseNet=0.07084, SegNet=0.00509, archive=896KB

github-actions · 2026-04-07T23:26:12Z

Thanks for the submission @EthanYangTW! 🤏

A maintainer will review your PR shortly.

To run the evaluation, a maintainer will trigger the eval workflow with your PR number.

Copilot

Pull request overview

Adds a new submission (roi_v2) that applies ROI-aware preprocessing before AV1 encoding and introduces a custom SVT-AV1/ffmpeg toolchain, while also updating the existing optimized submission scripts and .gitignore to support bundled shared libraries.

Changes:

Add submissions/roi_v2 preprocessing (ROI mask + denoise/chroma smoothing outside corridor) and updated compression/inflation scripts.
Bundle usage support for a custom ffmpeg binary and SVT-AV1 shared library via LD_LIBRARY_PATH.
Update submissions/optimized scripts and tweak .gitignore to attempt to allow committing .so artifacts.

Reviewed changes

Copilot reviewed 8 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
submissions/roi_v2/preprocess.py	New ROI-aware preprocessing pipeline for denoising/chroma smoothing outside a corridor mask.
submissions/roi_v2/compress.sh	Compression script wiring preprocessing + ffmpeg/libsvtav1 encode with scaling and params.
submissions/roi_v2/inflate.py	Inflation script with Lanczos resize + 9-tap binomial unsharp mask.
submissions/roi_v2/inflate.sh	Wrapper to run the roi_v2 inflation module over the archive outputs.
submissions/optimized/compress.sh	Updated encoding settings and CLI arg handling for the optimized submission.
submissions/optimized/inflate.py	Inflation script with Laplacian sharpening after resize.
submissions/optimized/inflate.sh	Wrapper to run optimized inflation module over the archive outputs.
.gitignore	Attempts to unignore `lib/.so` for bundled shared libraries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-07T23:36:52Z

submissions/roi_v2/compress.sh

+export IN_DIR ARCHIVE_DIR PD
+head -n "$(wc -l < "$VIDEO_NAMES_FILE")" "$VIDEO_NAMES_FILE" | xargs -P1 -I{} bash -lc '
+  rel="$1"; [[ -z "$rel" ]] && exit 0
+  IN="${IN_DIR}/${rel}"; BASE="${rel%.*}"
+  OUT="${ARCHIVE_DIR}/${BASE}.mkv"; PRE_IN="'"${TMP_DIR}"'/${BASE}.pre.mkv"
+  rm -f "$PRE_IN"
+  cd "'"${PD}"'"
+  .venv/bin/python -m submissions.roi_v2.preprocess \
+    --input "$IN" --output "$PRE_IN" \
+    --outside-luma-denoise 2.5 --outside-chroma-mode medium \
+    --feather-radius 24 --outside-blend 0.50
+  FFMPEG="'"${HERE}"'/ffmpeg-new"
+  [ ! -x "$FFMPEG" ] && FFMPEG="ffmpeg"
+  export LD_LIBRARY_PATH="'"${HERE}"'/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"


The inner bash -lc command is building PRE_IN with embedded single-quote characters (PRE_IN="'"${TMP_DIR}"'/${BASE}.pre.mkv"). That makes the resulting path contain literal quotes, so rm -f, the Python preprocess step, and ffmpeg -i will likely operate on a different/nonexistent filename. Prefer exporting TMP_DIR (and other needed vars) for the subshell and set PRE_IN="${TMP_DIR}/${BASE}.pre.mkv" inside the subshell, or avoid nested quoting by using a plain while read loop.

Suggested change

export IN_DIR ARCHIVE_DIR PD

head -n "$(wc -l < "$VIDEO_NAMES_FILE")" "$VIDEO_NAMES_FILE" | xargs -P1 -I{} bash -lc '

rel="$1"; [[ -z "$rel" ]] && exit 0

IN="${IN_DIR}/${rel}"; BASE="${rel%.*}"

OUT="${ARCHIVE_DIR}/${BASE}.mkv"; PRE_IN="'"${TMP_DIR}"'/${BASE}.pre.mkv"

rm -f "$PRE_IN"

cd "'"${PD}"'"

.venv/bin/python -m submissions.roi_v2.preprocess \

--input "$IN" --output "$PRE_IN" \

--outside-luma-denoise 2.5 --outside-chroma-mode medium \

--feather-radius 24 --outside-blend 0.50

FFMPEG="'"${HERE}"'/ffmpeg-new"

[ ! -x "$FFMPEG" ] && FFMPEG="ffmpeg"

export LD_LIBRARY_PATH="'"${HERE}"'/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"

export IN_DIR ARCHIVE_DIR PD TMP_DIR HERE

head -n "$(wc -l < "$VIDEO_NAMES_FILE")" "$VIDEO_NAMES_FILE" | xargs -P1 -I{} bash -lc '

rel="$1"; [[ -z "$rel" ]] && exit 0

IN="${IN_DIR}/${rel}"; BASE="${rel%.*}"

OUT="${ARCHIVE_DIR}/${BASE}.mkv"; PRE_IN="${TMP_DIR}/${BASE}.pre.mkv"

rm -f "$PRE_IN"

cd "${PD}"

.venv/bin/python -m submissions.roi_v2.preprocess \

--input "$IN" --output "$PRE_IN" \

--outside-luma-denoise 2.5 --outside-chroma-mode medium \

--feather-radius 24 --outside-blend 0.50

FFMPEG="${HERE}/ffmpeg-new"

[ ! -x "$FFMPEG" ] && FFMPEG="ffmpeg"

export LD_LIBRARY_PATH="${HERE}/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"

Copilot · 2026-04-07T23:36:52Z

submissions/roi_v2/compress.sh

+export IN_DIR ARCHIVE_DIR PD
+head -n "$(wc -l < "$VIDEO_NAMES_FILE")" "$VIDEO_NAMES_FILE" | xargs -P1 -I{} bash -lc '
+  rel="$1"; [[ -z "$rel" ]] && exit 0
+  IN="${IN_DIR}/${rel}"; BASE="${rel%.*}"
+  OUT="${ARCHIVE_DIR}/${BASE}.mkv"; PRE_IN="'"${TMP_DIR}"'/${BASE}.pre.mkv"
+  rm -f "$PRE_IN"
+  cd "'"${PD}"'"
+  .venv/bin/python -m submissions.roi_v2.preprocess \
+    --input "$IN" --output "$PRE_IN" \
+    --outside-luma-denoise 2.5 --outside-chroma-mode medium \
+    --feather-radius 24 --outside-blend 0.50
+  FFMPEG="'"${HERE}"'/ffmpeg-new"
+  [ ! -x "$FFMPEG" ] && FFMPEG="ffmpeg"
+  export LD_LIBRARY_PATH="'"${HERE}"'/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"


Similar quoting issue for FFMPEG and LD_LIBRARY_PATH (values include literal single quotes). That can make the -x check and the final ffmpeg invocation look for a path like '/.../ffmpeg-new' instead of the real file, and can also corrupt the library search path. Use normal double-quoting without injecting literal quotes, or export HERE for the subshell and reference it directly.

Suggested change

export IN_DIR ARCHIVE_DIR PD

head -n "$(wc -l < "$VIDEO_NAMES_FILE")" "$VIDEO_NAMES_FILE" | xargs -P1 -I{} bash -lc '

rel="$1"; [[ -z "$rel" ]] && exit 0

IN="${IN_DIR}/${rel}"; BASE="${rel%.*}"

OUT="${ARCHIVE_DIR}/${BASE}.mkv"; PRE_IN="'"${TMP_DIR}"'/${BASE}.pre.mkv"

rm -f "$PRE_IN"

cd "'"${PD}"'"

.venv/bin/python -m submissions.roi_v2.preprocess \

--input "$IN" --output "$PRE_IN" \

--outside-luma-denoise 2.5 --outside-chroma-mode medium \

--feather-radius 24 --outside-blend 0.50

FFMPEG="'"${HERE}"'/ffmpeg-new"

[ ! -x "$FFMPEG" ] && FFMPEG="ffmpeg"

export LD_LIBRARY_PATH="'"${HERE}"'/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"

export IN_DIR ARCHIVE_DIR PD HERE

head -n "$(wc -l < "$VIDEO_NAMES_FILE")" "$VIDEO_NAMES_FILE" | xargs -P1 -I{} bash -lc '

rel="$1"; [[ -z "$rel" ]] && exit 0

IN="${IN_DIR}/${rel}"; BASE="${rel%.*}"

OUT="${ARCHIVE_DIR}/${BASE}.mkv"; PRE_IN="'"${TMP_DIR}"'/${BASE}.pre.mkv"

rm -f "$PRE_IN"

cd "$PD"

.venv/bin/python -m submissions.roi_v2.preprocess \

--input "$IN" --output "$PRE_IN" \

--outside-luma-denoise 2.5 --outside-chroma-mode medium \

--feather-radius 24 --outside-blend 0.50

FFMPEG="${HERE}/ffmpeg-new"

[ ! -x "$FFMPEG" ] && FFMPEG="ffmpeg"

export LD_LIBRARY_PATH="${HERE}/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"

Copilot · 2026-04-07T23:36:52Z

submissions/roi_v2/preprocess.py

+  kernel_1d = (g / g.sum()).float()
+  kernel_2d = torch.outer(kernel_1d, kernel_1d).view(1, 1, kernel_size, kernel_size)
+  y = x[:, 0:1]
+  y_blur = F.conv2d(y, kernel_2d, padding=kernel_size // 2)


apply_luma_denoise uses F.conv2d(..., padding=kernel_size//2) which applies zero-padding at the borders. That will darken/brighten edges and create visible border artifacts (and can harm downstream metrics). Consider using F.pad(..., mode='reflect'|'replicate') before the convolution and set padding=0 for the conv.

Suggested change

kernel_1d = (g / g.sum()).float()

kernel_2d = torch.outer(kernel_1d, kernel_1d).view(1, 1, kernel_size, kernel_size)

y = x[:, 0:1]

y_blur = F.conv2d(y, kernel_2d, padding=kernel_size // 2)

kernel_1d = (g / g.sum()).to(dtype=x.dtype)

kernel_2d = torch.outer(kernel_1d, kernel_1d).view(1, 1, kernel_size, kernel_size)

y = x[:, 0:1]

pad = kernel_size // 2

y_padded = F.pad(y, (pad, pad, pad, pad), mode="replicate")

y_blur = F.conv2d(y_padded, kernel_2d, padding=0)

Copilot · 2026-04-07T23:36:53Z

submissions/roi_v2/preprocess.py

+  parser.add_argument("--input", type=Path, required=True)
+  parser.add_argument("--output", type=Path, required=True)
+  parser.add_argument("--outside-luma-denoise", type=float, default=2.5)
+  parser.add_argument("--outside-chroma-mode", type=str, default="medium")


collapse_chroma assumes mode is one of {soft, medium, strong, normal} and will raise a KeyError for any other value, but the argparse option doesn't restrict choices. Add choices=["normal","soft","medium","strong"] (and/or validate with a clear error) so bad CLI input fails with a helpful message.

Suggested change

parser.add_argument("--outside-chroma-mode", type=str, default="medium")

parser.add_argument(

"--outside-chroma-mode",

type=str,

default="medium",

choices=["normal", "soft", "medium", "strong"],

)

Copilot · 2026-04-07T23:36:53Z

submissions/roi_v2/preprocess.py

+
+  out_container = av.open(str(args.output), mode="w")
+  out_stream = out_container.add_stream("ffv1", rate=20)


The output stream rate is hard-coded to 20 fps (add_stream(..., rate=20)). If the input file has a different FPS/time base, this will rewrite timing and can change frame pacing. Prefer deriving the rate from the input stream (e.g., in_stream.average_rate) or explicitly copying timing metadata when writing the intermediate file.

Suggested change

out_container = av.open(str(args.output), mode="w")

out_stream = out_container.add_stream("ffv1", rate=20)

output_rate = in_stream.average_rate or in_stream.base_rate or 20

out_container = av.open(str(args.output), mode="w")

out_stream = out_container.add_stream("ffv1", rate=output_rate)

Copilot · 2026-04-07T23:36:53Z

submissions/roi_v2/preprocess.py

+def build_mask(frame_idx: int, width: int, height: int, feather_radius: int) -> torch.Tensor:
+  img = Image.new("L", (width, height), 0)
+  draw = ImageDraw.Draw(img)
+  draw.polygon(segment_polygon(frame_idx, width, height), fill=255)
+  if feather_radius > 0:
+    img = img.filter(ImageFilter.GaussianBlur(radius=feather_radius))
+  mask = torch.frombuffer(memoryview(img.tobytes()), dtype=torch.uint8).clone().view(height, width).float() / 255.0
+  return mask.unsqueeze(0).unsqueeze(0)


build_mask constructs and Gaussian-blurs a full-resolution PIL image for every frame. For 1200-frame videos this is a significant CPU cost during compression. Since the mask only changes across a few frame ranges, consider caching the blurred mask per segment (or precomputing per frame_idx) and reusing it instead of rebuilding it every frame.

EthanYangTW · 2026-04-08T00:08:54Z

Closing to revise — will reopen with improved score

claude and others added 8 commits April 4, 2026 02:37

Fix unsharp mask padding in inflate - separable convolution dimension…

b1b0da8

… mismatch https://claude.ai/code/session_01J1SkNwMqYEUc7KBrn4xcsx

Switch to 45% scale CRF 32 for better score tradeoff

1e8d347

45% scale + CRF 32 scores 2.23 on MPS (vs 2.30 for 50% CRF 34). Based on the ~0.25 MPS-to-CI offset observed, this should score ~1.95-2.0 on CI.

Final inflate with optimal 3x3 Laplacian sharpening (strength=0.20)

69e8827

Tested preset 0 + sharpen = 2.04 (vs 2.05 without, vs 2.20 original baseline) Tested sweep: 3x3 Laplacian > 5x5 LoG > luma-only > Gaussian blur https://claude.ai/code/session_01J1SkNwMqYEUc7KBrn4xcsx

Copilot AI review requested due to automatic review settings April 7, 2026 23:26

github-actions bot requested a review from YassineYousfi April 7, 2026 23:26

Copilot started reviewing on behalf of EthanYangTW April 7, 2026 23:26 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

EthanYangTW closed this Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

roi_v2: ROI preprocessing + bundled SVT-AV1 v2.3.0#42

roi_v2: ROI preprocessing + bundled SVT-AV1 v2.3.0#42
EthanYangTW wants to merge 8 commits intocommaai:masterfrom
EthanYangTW:claude/optimize-sub-2-performance-adVWT

EthanYangTW commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

EthanYangTW commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		out_container = av.open(str(args.output), mode="w")
		out_stream = out_container.add_stream("ffv1", rate=20)

Conversation

EthanYangTW commented Apr 7, 2026

submission name

upload zipped archive.zip

report.txt

does your submission require gpu for evaluation (inflation)?

did you include the compression script? and want it to be merged?

additional comments

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

EthanYangTW commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

upload zipped `archive.zip`