Skip to content

roi_v2: ROI preprocessing + bundled SVT-AV1 v2.3.0#42

Closed
EthanYangTW wants to merge 8 commits intocommaai:masterfrom
EthanYangTW:claude/optimize-sub-2-performance-adVWT
Closed

roi_v2: ROI preprocessing + bundled SVT-AV1 v2.3.0#42
EthanYangTW wants to merge 8 commits intocommaai:masterfrom
EthanYangTW:claude/optimize-sub-2-performance-adVWT

Conversation

@EthanYangTW
Copy link
Copy Markdown
Contributor

submission name

roi_v2

upload zipped archive.zip

archive.zip

report.txt

=== Evaluation results over 600 samples ===
  Average PoseNet Distortion: 0.07084085
  Average SegNet Distortion: 0.00508657
  Submission file size: 896,108 bytes
  Original uncompressed size: 37,545,489 bytes
  Compression Rate: 0.02386000
  Final score: 100*segnet_dist + √(10*posenet_dist) + 25*rate = 1.947 (estimated)

does your submission require gpu for evaluation (inflation)?

no

did you include the compression script? and want it to be merged?

yes

additional comments

SVT-AV1 v2.3.0 bundled (lib/libSvtAv1Enc.so.2.3.0) with a custom ffmpeg-new binary.
compress.sh sets LD_LIBRARY_PATH to use the bundled v2.3.0 library on CI.
Same ROI-aware preprocessing as PR #31 (denoise outside driving corridor, blend=0.50, feather=24).
Frames downscaled 45% Lanczos before encoding, decoded with Lanczos upscale + 9-tap binomial USM at 40%.

claude and others added 8 commits April 4, 2026 02:37
Novel techniques for sub-2.0 scoring:
- Encode at exact model input resolution (512x384) - zero wasted pixels
- SVT-AV1 codec (~30% more efficient than H.265)
- Full temporal compression with GOP 64 (baseline used all-keyframes)
- Edge-preserving nlmeans denoising pre-filter
- Lanczos upsampling + subtle unsharp mask during inflate
- Variance-based adaptive quantization to protect semantic edges

https://claude.ai/code/session_01J1SkNwMqYEUc7KBrn4xcsx
Key innovations over baseline (4.39) and previous 2.20:
- 50% lanczos downscale (vs 45%)
- SVT-AV1 preset 0 with CRF 34
- enable-qm=1:qm-min=0 (novel: adaptive quantization matrices)
- film-grain=22 denoise for compression efficiency
- GOP 240 for temporal compression

Tested locally: score 2.05
  PoseNet: 0.07076 | SegNet: 0.00576 | Rate: 0.02514

https://claude.ai/code/session_01J1SkNwMqYEUc7KBrn4xcsx
Key changes from previous config:
- 50% lanczos downscale (was nlmeans + 512x384)
- CRF 34 (was 30), GOP 240 (was 64)
- Enable quantization matrices (enable-qm=1:qm-min=0) - biggest quality win
- Film grain synthesis (fg=22, denoise=1) for better temporal tracking
- Simplified inflate.py: plain bicubic upscale, removed unused unsharp mask
- .mkv container (was .ivf)

Scored 2.05 in cloud eval (vs 2.20 baseline)
45% scale + CRF 32 scores 2.23 on MPS (vs 2.30 for 50% CRF 34).
Based on the ~0.25 MPS-to-CI offset observed, this should score ~1.95-2.0 on CI.
Tested locally with real model weights and test video:
- Preset 4 + no sharpen: 2.20
- Preset 4 + sharpen: 2.18
- Preset 0 + no sharpen: 2.05
- Preset 0 + sharpen: ~2.03 (estimated)

Novel techniques:
- enable-qm=1:qm-min=0 (quantization matrices)
- Laplacian sharpening (strength=0.20) during inflate

https://claude.ai/code/session_01J1SkNwMqYEUc7KBrn4xcsx
Tested preset 0 + sharpen = 2.04 (vs 2.05 without, vs 2.20 original baseline)
Tested sweep: 3x3 Laplacian > 5x5 LoG > luma-only > Gaussian blur

https://claude.ai/code/session_01J1SkNwMqYEUc7KBrn4xcsx
Bundle SVT-AV1 v2.3.0 library (libSvtAv1Enc.so.2.3.0) with a custom
ffmpeg binary, using LD_LIBRARY_PATH to ensure v2.3.0 is used on CI
instead of the system's newer version. Achieves estimated CI score ~1.947,
beating PR commaai#31's 1.95 (same ROI preprocessing + encode params).

Local evaluation: PoseNet=0.07084, SegNet=0.00509, archive=896KB
Copilot AI review requested due to automatic review settings April 7, 2026 23:26
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Thanks for the submission @EthanYangTW! 🤏

A maintainer will review your PR shortly.

To run the evaluation, a maintainer will trigger the eval workflow with your PR number.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new submission (roi_v2) that applies ROI-aware preprocessing before AV1 encoding and introduces a custom SVT-AV1/ffmpeg toolchain, while also updating the existing optimized submission scripts and .gitignore to support bundled shared libraries.

Changes:

  • Add submissions/roi_v2 preprocessing (ROI mask + denoise/chroma smoothing outside corridor) and updated compression/inflation scripts.
  • Bundle usage support for a custom ffmpeg binary and SVT-AV1 shared library via LD_LIBRARY_PATH.
  • Update submissions/optimized scripts and tweak .gitignore to attempt to allow committing .so artifacts.

Reviewed changes

Copilot reviewed 8 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
submissions/roi_v2/preprocess.py New ROI-aware preprocessing pipeline for denoising/chroma smoothing outside a corridor mask.
submissions/roi_v2/compress.sh Compression script wiring preprocessing + ffmpeg/libsvtav1 encode with scaling and params.
submissions/roi_v2/inflate.py Inflation script with Lanczos resize + 9-tap binomial unsharp mask.
submissions/roi_v2/inflate.sh Wrapper to run the roi_v2 inflation module over the archive outputs.
submissions/optimized/compress.sh Updated encoding settings and CLI arg handling for the optimized submission.
submissions/optimized/inflate.py Inflation script with Laplacian sharpening after resize.
submissions/optimized/inflate.sh Wrapper to run optimized inflation module over the archive outputs.
.gitignore Attempts to unignore lib/*.so* for bundled shared libraries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +10 to +23
export IN_DIR ARCHIVE_DIR PD
head -n "$(wc -l < "$VIDEO_NAMES_FILE")" "$VIDEO_NAMES_FILE" | xargs -P1 -I{} bash -lc '
rel="$1"; [[ -z "$rel" ]] && exit 0
IN="${IN_DIR}/${rel}"; BASE="${rel%.*}"
OUT="${ARCHIVE_DIR}/${BASE}.mkv"; PRE_IN="'"${TMP_DIR}"'/${BASE}.pre.mkv"
rm -f "$PRE_IN"
cd "'"${PD}"'"
.venv/bin/python -m submissions.roi_v2.preprocess \
--input "$IN" --output "$PRE_IN" \
--outside-luma-denoise 2.5 --outside-chroma-mode medium \
--feather-radius 24 --outside-blend 0.50
FFMPEG="'"${HERE}"'/ffmpeg-new"
[ ! -x "$FFMPEG" ] && FFMPEG="ffmpeg"
export LD_LIBRARY_PATH="'"${HERE}"'/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner bash -lc command is building PRE_IN with embedded single-quote characters (PRE_IN="'"${TMP_DIR}"'/${BASE}.pre.mkv"). That makes the resulting path contain literal quotes, so rm -f, the Python preprocess step, and ffmpeg -i will likely operate on a different/nonexistent filename. Prefer exporting TMP_DIR (and other needed vars) for the subshell and set PRE_IN="${TMP_DIR}/${BASE}.pre.mkv" inside the subshell, or avoid nested quoting by using a plain while read loop.

Suggested change
export IN_DIR ARCHIVE_DIR PD
head -n "$(wc -l < "$VIDEO_NAMES_FILE")" "$VIDEO_NAMES_FILE" | xargs -P1 -I{} bash -lc '
rel="$1"; [[ -z "$rel" ]] && exit 0
IN="${IN_DIR}/${rel}"; BASE="${rel%.*}"
OUT="${ARCHIVE_DIR}/${BASE}.mkv"; PRE_IN="'"${TMP_DIR}"'/${BASE}.pre.mkv"
rm -f "$PRE_IN"
cd "'"${PD}"'"
.venv/bin/python -m submissions.roi_v2.preprocess \
--input "$IN" --output "$PRE_IN" \
--outside-luma-denoise 2.5 --outside-chroma-mode medium \
--feather-radius 24 --outside-blend 0.50
FFMPEG="'"${HERE}"'/ffmpeg-new"
[ ! -x "$FFMPEG" ] && FFMPEG="ffmpeg"
export LD_LIBRARY_PATH="'"${HERE}"'/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
export IN_DIR ARCHIVE_DIR PD TMP_DIR HERE
head -n "$(wc -l < "$VIDEO_NAMES_FILE")" "$VIDEO_NAMES_FILE" | xargs -P1 -I{} bash -lc '
rel="$1"; [[ -z "$rel" ]] && exit 0
IN="${IN_DIR}/${rel}"; BASE="${rel%.*}"
OUT="${ARCHIVE_DIR}/${BASE}.mkv"; PRE_IN="${TMP_DIR}/${BASE}.pre.mkv"
rm -f "$PRE_IN"
cd "${PD}"
.venv/bin/python -m submissions.roi_v2.preprocess \
--input "$IN" --output "$PRE_IN" \
--outside-luma-denoise 2.5 --outside-chroma-mode medium \
--feather-radius 24 --outside-blend 0.50
FFMPEG="${HERE}/ffmpeg-new"
[ ! -x "$FFMPEG" ] && FFMPEG="ffmpeg"
export LD_LIBRARY_PATH="${HERE}/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"

Copilot uses AI. Check for mistakes.
Comment on lines +10 to +23
export IN_DIR ARCHIVE_DIR PD
head -n "$(wc -l < "$VIDEO_NAMES_FILE")" "$VIDEO_NAMES_FILE" | xargs -P1 -I{} bash -lc '
rel="$1"; [[ -z "$rel" ]] && exit 0
IN="${IN_DIR}/${rel}"; BASE="${rel%.*}"
OUT="${ARCHIVE_DIR}/${BASE}.mkv"; PRE_IN="'"${TMP_DIR}"'/${BASE}.pre.mkv"
rm -f "$PRE_IN"
cd "'"${PD}"'"
.venv/bin/python -m submissions.roi_v2.preprocess \
--input "$IN" --output "$PRE_IN" \
--outside-luma-denoise 2.5 --outside-chroma-mode medium \
--feather-radius 24 --outside-blend 0.50
FFMPEG="'"${HERE}"'/ffmpeg-new"
[ ! -x "$FFMPEG" ] && FFMPEG="ffmpeg"
export LD_LIBRARY_PATH="'"${HERE}"'/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar quoting issue for FFMPEG and LD_LIBRARY_PATH (values include literal single quotes). That can make the -x check and the final ffmpeg invocation look for a path like '/.../ffmpeg-new' instead of the real file, and can also corrupt the library search path. Use normal double-quoting without injecting literal quotes, or export HERE for the subshell and reference it directly.

Suggested change
export IN_DIR ARCHIVE_DIR PD
head -n "$(wc -l < "$VIDEO_NAMES_FILE")" "$VIDEO_NAMES_FILE" | xargs -P1 -I{} bash -lc '
rel="$1"; [[ -z "$rel" ]] && exit 0
IN="${IN_DIR}/${rel}"; BASE="${rel%.*}"
OUT="${ARCHIVE_DIR}/${BASE}.mkv"; PRE_IN="'"${TMP_DIR}"'/${BASE}.pre.mkv"
rm -f "$PRE_IN"
cd "'"${PD}"'"
.venv/bin/python -m submissions.roi_v2.preprocess \
--input "$IN" --output "$PRE_IN" \
--outside-luma-denoise 2.5 --outside-chroma-mode medium \
--feather-radius 24 --outside-blend 0.50
FFMPEG="'"${HERE}"'/ffmpeg-new"
[ ! -x "$FFMPEG" ] && FFMPEG="ffmpeg"
export LD_LIBRARY_PATH="'"${HERE}"'/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
export IN_DIR ARCHIVE_DIR PD HERE
head -n "$(wc -l < "$VIDEO_NAMES_FILE")" "$VIDEO_NAMES_FILE" | xargs -P1 -I{} bash -lc '
rel="$1"; [[ -z "$rel" ]] && exit 0
IN="${IN_DIR}/${rel}"; BASE="${rel%.*}"
OUT="${ARCHIVE_DIR}/${BASE}.mkv"; PRE_IN="'"${TMP_DIR}"'/${BASE}.pre.mkv"
rm -f "$PRE_IN"
cd "$PD"
.venv/bin/python -m submissions.roi_v2.preprocess \
--input "$IN" --output "$PRE_IN" \
--outside-luma-denoise 2.5 --outside-chroma-mode medium \
--feather-radius 24 --outside-blend 0.50
FFMPEG="${HERE}/ffmpeg-new"
[ ! -x "$FFMPEG" ] && FFMPEG="ffmpeg"
export LD_LIBRARY_PATH="${HERE}/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +39
kernel_1d = (g / g.sum()).float()
kernel_2d = torch.outer(kernel_1d, kernel_1d).view(1, 1, kernel_size, kernel_size)
y = x[:, 0:1]
y_blur = F.conv2d(y, kernel_2d, padding=kernel_size // 2)
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply_luma_denoise uses F.conv2d(..., padding=kernel_size//2) which applies zero-padding at the borders. That will darken/brighten edges and create visible border artifacts (and can harm downstream metrics). Consider using F.pad(..., mode='reflect'|'replicate') before the convolution and set padding=0 for the conv.

Suggested change
kernel_1d = (g / g.sum()).float()
kernel_2d = torch.outer(kernel_1d, kernel_1d).view(1, 1, kernel_size, kernel_size)
y = x[:, 0:1]
y_blur = F.conv2d(y, kernel_2d, padding=kernel_size // 2)
kernel_1d = (g / g.sum()).to(dtype=x.dtype)
kernel_2d = torch.outer(kernel_1d, kernel_1d).view(1, 1, kernel_size, kernel_size)
y = x[:, 0:1]
pad = kernel_size // 2
y_padded = F.pad(y, (pad, pad, pad, pad), mode="replicate")
y_blur = F.conv2d(y_padded, kernel_2d, padding=0)

Copilot uses AI. Check for mistakes.
parser.add_argument("--input", type=Path, required=True)
parser.add_argument("--output", type=Path, required=True)
parser.add_argument("--outside-luma-denoise", type=float, default=2.5)
parser.add_argument("--outside-chroma-mode", type=str, default="medium")
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collapse_chroma assumes mode is one of {soft, medium, strong, normal} and will raise a KeyError for any other value, but the argparse option doesn't restrict choices. Add choices=["normal","soft","medium","strong"] (and/or validate with a clear error) so bad CLI input fails with a helpful message.

Suggested change
parser.add_argument("--outside-chroma-mode", type=str, default="medium")
parser.add_argument(
"--outside-chroma-mode",
type=str,
default="medium",
choices=["normal", "soft", "medium", "strong"],
)

Copilot uses AI. Check for mistakes.
Comment on lines +118 to +120

out_container = av.open(str(args.output), mode="w")
out_stream = out_container.add_stream("ffv1", rate=20)
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output stream rate is hard-coded to 20 fps (add_stream(..., rate=20)). If the input file has a different FPS/time base, this will rewrite timing and can change frame pacing. Prefer deriving the rate from the input stream (e.g., in_stream.average_rate) or explicitly copying timing metadata when writing the intermediate file.

Suggested change
out_container = av.open(str(args.output), mode="w")
out_stream = out_container.add_stream("ffv1", rate=20)
output_rate = in_stream.average_rate or in_stream.base_rate or 20
out_container = av.open(str(args.output), mode="w")
out_stream = out_container.add_stream("ffv1", rate=output_rate)

Copilot uses AI. Check for mistakes.
Comment on lines +75 to +82
def build_mask(frame_idx: int, width: int, height: int, feather_radius: int) -> torch.Tensor:
img = Image.new("L", (width, height), 0)
draw = ImageDraw.Draw(img)
draw.polygon(segment_polygon(frame_idx, width, height), fill=255)
if feather_radius > 0:
img = img.filter(ImageFilter.GaussianBlur(radius=feather_radius))
mask = torch.frombuffer(memoryview(img.tobytes()), dtype=torch.uint8).clone().view(height, width).float() / 255.0
return mask.unsqueeze(0).unsqueeze(0)
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_mask constructs and Gaussian-blurs a full-resolution PIL image for every frame. For 1200-frame videos this is a significant CPU cost during compression. Since the mask only changes across a few frame ranges, consider caching the blurred mask per segment (or precomputing per frame_idx) and reusing it instead of rebuilding it every frame.

Copilot uses AI. Check for mistakes.
@EthanYangTW
Copy link
Copy Markdown
Contributor Author

Closing to revise — will reopen with improved score

@EthanYangTW EthanYangTW closed this Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants