roi_v2: ROI preprocessing + bundled SVT-AV1 v2.3.0#42
roi_v2: ROI preprocessing + bundled SVT-AV1 v2.3.0#42EthanYangTW wants to merge 8 commits intocommaai:masterfrom
Conversation
Novel techniques for sub-2.0 scoring: - Encode at exact model input resolution (512x384) - zero wasted pixels - SVT-AV1 codec (~30% more efficient than H.265) - Full temporal compression with GOP 64 (baseline used all-keyframes) - Edge-preserving nlmeans denoising pre-filter - Lanczos upsampling + subtle unsharp mask during inflate - Variance-based adaptive quantization to protect semantic edges https://claude.ai/code/session_01J1SkNwMqYEUc7KBrn4xcsx
Key innovations over baseline (4.39) and previous 2.20: - 50% lanczos downscale (vs 45%) - SVT-AV1 preset 0 with CRF 34 - enable-qm=1:qm-min=0 (novel: adaptive quantization matrices) - film-grain=22 denoise for compression efficiency - GOP 240 for temporal compression Tested locally: score 2.05 PoseNet: 0.07076 | SegNet: 0.00576 | Rate: 0.02514 https://claude.ai/code/session_01J1SkNwMqYEUc7KBrn4xcsx
Key changes from previous config: - 50% lanczos downscale (was nlmeans + 512x384) - CRF 34 (was 30), GOP 240 (was 64) - Enable quantization matrices (enable-qm=1:qm-min=0) - biggest quality win - Film grain synthesis (fg=22, denoise=1) for better temporal tracking - Simplified inflate.py: plain bicubic upscale, removed unused unsharp mask - .mkv container (was .ivf) Scored 2.05 in cloud eval (vs 2.20 baseline)
45% scale + CRF 32 scores 2.23 on MPS (vs 2.30 for 50% CRF 34). Based on the ~0.25 MPS-to-CI offset observed, this should score ~1.95-2.0 on CI.
Tested locally with real model weights and test video: - Preset 4 + no sharpen: 2.20 - Preset 4 + sharpen: 2.18 - Preset 0 + no sharpen: 2.05 - Preset 0 + sharpen: ~2.03 (estimated) Novel techniques: - enable-qm=1:qm-min=0 (quantization matrices) - Laplacian sharpening (strength=0.20) during inflate https://claude.ai/code/session_01J1SkNwMqYEUc7KBrn4xcsx
Tested preset 0 + sharpen = 2.04 (vs 2.05 without, vs 2.20 original baseline) Tested sweep: 3x3 Laplacian > 5x5 LoG > luma-only > Gaussian blur https://claude.ai/code/session_01J1SkNwMqYEUc7KBrn4xcsx
Bundle SVT-AV1 v2.3.0 library (libSvtAv1Enc.so.2.3.0) with a custom ffmpeg binary, using LD_LIBRARY_PATH to ensure v2.3.0 is used on CI instead of the system's newer version. Achieves estimated CI score ~1.947, beating PR commaai#31's 1.95 (same ROI preprocessing + encode params). Local evaluation: PoseNet=0.07084, SegNet=0.00509, archive=896KB
|
Thanks for the submission @EthanYangTW! 🤏 A maintainer will review your PR shortly. To run the evaluation, a maintainer will trigger the |
There was a problem hiding this comment.
Pull request overview
Adds a new submission (roi_v2) that applies ROI-aware preprocessing before AV1 encoding and introduces a custom SVT-AV1/ffmpeg toolchain, while also updating the existing optimized submission scripts and .gitignore to support bundled shared libraries.
Changes:
- Add
submissions/roi_v2preprocessing (ROI mask + denoise/chroma smoothing outside corridor) and updated compression/inflation scripts. - Bundle usage support for a custom ffmpeg binary and SVT-AV1 shared library via
LD_LIBRARY_PATH. - Update
submissions/optimizedscripts and tweak.gitignoreto attempt to allow committing.soartifacts.
Reviewed changes
Copilot reviewed 8 out of 12 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| submissions/roi_v2/preprocess.py | New ROI-aware preprocessing pipeline for denoising/chroma smoothing outside a corridor mask. |
| submissions/roi_v2/compress.sh | Compression script wiring preprocessing + ffmpeg/libsvtav1 encode with scaling and params. |
| submissions/roi_v2/inflate.py | Inflation script with Lanczos resize + 9-tap binomial unsharp mask. |
| submissions/roi_v2/inflate.sh | Wrapper to run the roi_v2 inflation module over the archive outputs. |
| submissions/optimized/compress.sh | Updated encoding settings and CLI arg handling for the optimized submission. |
| submissions/optimized/inflate.py | Inflation script with Laplacian sharpening after resize. |
| submissions/optimized/inflate.sh | Wrapper to run optimized inflation module over the archive outputs. |
| .gitignore | Attempts to unignore lib/*.so* for bundled shared libraries. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| export IN_DIR ARCHIVE_DIR PD | ||
| head -n "$(wc -l < "$VIDEO_NAMES_FILE")" "$VIDEO_NAMES_FILE" | xargs -P1 -I{} bash -lc ' | ||
| rel="$1"; [[ -z "$rel" ]] && exit 0 | ||
| IN="${IN_DIR}/${rel}"; BASE="${rel%.*}" | ||
| OUT="${ARCHIVE_DIR}/${BASE}.mkv"; PRE_IN="'"${TMP_DIR}"'/${BASE}.pre.mkv" | ||
| rm -f "$PRE_IN" | ||
| cd "'"${PD}"'" | ||
| .venv/bin/python -m submissions.roi_v2.preprocess \ | ||
| --input "$IN" --output "$PRE_IN" \ | ||
| --outside-luma-denoise 2.5 --outside-chroma-mode medium \ | ||
| --feather-radius 24 --outside-blend 0.50 | ||
| FFMPEG="'"${HERE}"'/ffmpeg-new" | ||
| [ ! -x "$FFMPEG" ] && FFMPEG="ffmpeg" | ||
| export LD_LIBRARY_PATH="'"${HERE}"'/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" |
There was a problem hiding this comment.
The inner bash -lc command is building PRE_IN with embedded single-quote characters (PRE_IN="'"${TMP_DIR}"'/${BASE}.pre.mkv"). That makes the resulting path contain literal quotes, so rm -f, the Python preprocess step, and ffmpeg -i will likely operate on a different/nonexistent filename. Prefer exporting TMP_DIR (and other needed vars) for the subshell and set PRE_IN="${TMP_DIR}/${BASE}.pre.mkv" inside the subshell, or avoid nested quoting by using a plain while read loop.
| export IN_DIR ARCHIVE_DIR PD | |
| head -n "$(wc -l < "$VIDEO_NAMES_FILE")" "$VIDEO_NAMES_FILE" | xargs -P1 -I{} bash -lc ' | |
| rel="$1"; [[ -z "$rel" ]] && exit 0 | |
| IN="${IN_DIR}/${rel}"; BASE="${rel%.*}" | |
| OUT="${ARCHIVE_DIR}/${BASE}.mkv"; PRE_IN="'"${TMP_DIR}"'/${BASE}.pre.mkv" | |
| rm -f "$PRE_IN" | |
| cd "'"${PD}"'" | |
| .venv/bin/python -m submissions.roi_v2.preprocess \ | |
| --input "$IN" --output "$PRE_IN" \ | |
| --outside-luma-denoise 2.5 --outside-chroma-mode medium \ | |
| --feather-radius 24 --outside-blend 0.50 | |
| FFMPEG="'"${HERE}"'/ffmpeg-new" | |
| [ ! -x "$FFMPEG" ] && FFMPEG="ffmpeg" | |
| export LD_LIBRARY_PATH="'"${HERE}"'/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" | |
| export IN_DIR ARCHIVE_DIR PD TMP_DIR HERE | |
| head -n "$(wc -l < "$VIDEO_NAMES_FILE")" "$VIDEO_NAMES_FILE" | xargs -P1 -I{} bash -lc ' | |
| rel="$1"; [[ -z "$rel" ]] && exit 0 | |
| IN="${IN_DIR}/${rel}"; BASE="${rel%.*}" | |
| OUT="${ARCHIVE_DIR}/${BASE}.mkv"; PRE_IN="${TMP_DIR}/${BASE}.pre.mkv" | |
| rm -f "$PRE_IN" | |
| cd "${PD}" | |
| .venv/bin/python -m submissions.roi_v2.preprocess \ | |
| --input "$IN" --output "$PRE_IN" \ | |
| --outside-luma-denoise 2.5 --outside-chroma-mode medium \ | |
| --feather-radius 24 --outside-blend 0.50 | |
| FFMPEG="${HERE}/ffmpeg-new" | |
| [ ! -x "$FFMPEG" ] && FFMPEG="ffmpeg" | |
| export LD_LIBRARY_PATH="${HERE}/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" |
| export IN_DIR ARCHIVE_DIR PD | ||
| head -n "$(wc -l < "$VIDEO_NAMES_FILE")" "$VIDEO_NAMES_FILE" | xargs -P1 -I{} bash -lc ' | ||
| rel="$1"; [[ -z "$rel" ]] && exit 0 | ||
| IN="${IN_DIR}/${rel}"; BASE="${rel%.*}" | ||
| OUT="${ARCHIVE_DIR}/${BASE}.mkv"; PRE_IN="'"${TMP_DIR}"'/${BASE}.pre.mkv" | ||
| rm -f "$PRE_IN" | ||
| cd "'"${PD}"'" | ||
| .venv/bin/python -m submissions.roi_v2.preprocess \ | ||
| --input "$IN" --output "$PRE_IN" \ | ||
| --outside-luma-denoise 2.5 --outside-chroma-mode medium \ | ||
| --feather-radius 24 --outside-blend 0.50 | ||
| FFMPEG="'"${HERE}"'/ffmpeg-new" | ||
| [ ! -x "$FFMPEG" ] && FFMPEG="ffmpeg" | ||
| export LD_LIBRARY_PATH="'"${HERE}"'/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" |
There was a problem hiding this comment.
Similar quoting issue for FFMPEG and LD_LIBRARY_PATH (values include literal single quotes). That can make the -x check and the final ffmpeg invocation look for a path like '/.../ffmpeg-new' instead of the real file, and can also corrupt the library search path. Use normal double-quoting without injecting literal quotes, or export HERE for the subshell and reference it directly.
| export IN_DIR ARCHIVE_DIR PD | |
| head -n "$(wc -l < "$VIDEO_NAMES_FILE")" "$VIDEO_NAMES_FILE" | xargs -P1 -I{} bash -lc ' | |
| rel="$1"; [[ -z "$rel" ]] && exit 0 | |
| IN="${IN_DIR}/${rel}"; BASE="${rel%.*}" | |
| OUT="${ARCHIVE_DIR}/${BASE}.mkv"; PRE_IN="'"${TMP_DIR}"'/${BASE}.pre.mkv" | |
| rm -f "$PRE_IN" | |
| cd "'"${PD}"'" | |
| .venv/bin/python -m submissions.roi_v2.preprocess \ | |
| --input "$IN" --output "$PRE_IN" \ | |
| --outside-luma-denoise 2.5 --outside-chroma-mode medium \ | |
| --feather-radius 24 --outside-blend 0.50 | |
| FFMPEG="'"${HERE}"'/ffmpeg-new" | |
| [ ! -x "$FFMPEG" ] && FFMPEG="ffmpeg" | |
| export LD_LIBRARY_PATH="'"${HERE}"'/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" | |
| export IN_DIR ARCHIVE_DIR PD HERE | |
| head -n "$(wc -l < "$VIDEO_NAMES_FILE")" "$VIDEO_NAMES_FILE" | xargs -P1 -I{} bash -lc ' | |
| rel="$1"; [[ -z "$rel" ]] && exit 0 | |
| IN="${IN_DIR}/${rel}"; BASE="${rel%.*}" | |
| OUT="${ARCHIVE_DIR}/${BASE}.mkv"; PRE_IN="'"${TMP_DIR}"'/${BASE}.pre.mkv" | |
| rm -f "$PRE_IN" | |
| cd "$PD" | |
| .venv/bin/python -m submissions.roi_v2.preprocess \ | |
| --input "$IN" --output "$PRE_IN" \ | |
| --outside-luma-denoise 2.5 --outside-chroma-mode medium \ | |
| --feather-radius 24 --outside-blend 0.50 | |
| FFMPEG="${HERE}/ffmpeg-new" | |
| [ ! -x "$FFMPEG" ] && FFMPEG="ffmpeg" | |
| export LD_LIBRARY_PATH="${HERE}/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" |
| kernel_1d = (g / g.sum()).float() | ||
| kernel_2d = torch.outer(kernel_1d, kernel_1d).view(1, 1, kernel_size, kernel_size) | ||
| y = x[:, 0:1] | ||
| y_blur = F.conv2d(y, kernel_2d, padding=kernel_size // 2) |
There was a problem hiding this comment.
apply_luma_denoise uses F.conv2d(..., padding=kernel_size//2) which applies zero-padding at the borders. That will darken/brighten edges and create visible border artifacts (and can harm downstream metrics). Consider using F.pad(..., mode='reflect'|'replicate') before the convolution and set padding=0 for the conv.
| kernel_1d = (g / g.sum()).float() | |
| kernel_2d = torch.outer(kernel_1d, kernel_1d).view(1, 1, kernel_size, kernel_size) | |
| y = x[:, 0:1] | |
| y_blur = F.conv2d(y, kernel_2d, padding=kernel_size // 2) | |
| kernel_1d = (g / g.sum()).to(dtype=x.dtype) | |
| kernel_2d = torch.outer(kernel_1d, kernel_1d).view(1, 1, kernel_size, kernel_size) | |
| y = x[:, 0:1] | |
| pad = kernel_size // 2 | |
| y_padded = F.pad(y, (pad, pad, pad, pad), mode="replicate") | |
| y_blur = F.conv2d(y_padded, kernel_2d, padding=0) |
| parser.add_argument("--input", type=Path, required=True) | ||
| parser.add_argument("--output", type=Path, required=True) | ||
| parser.add_argument("--outside-luma-denoise", type=float, default=2.5) | ||
| parser.add_argument("--outside-chroma-mode", type=str, default="medium") |
There was a problem hiding this comment.
collapse_chroma assumes mode is one of {soft, medium, strong, normal} and will raise a KeyError for any other value, but the argparse option doesn't restrict choices. Add choices=["normal","soft","medium","strong"] (and/or validate with a clear error) so bad CLI input fails with a helpful message.
| parser.add_argument("--outside-chroma-mode", type=str, default="medium") | |
| parser.add_argument( | |
| "--outside-chroma-mode", | |
| type=str, | |
| default="medium", | |
| choices=["normal", "soft", "medium", "strong"], | |
| ) |
|
|
||
| out_container = av.open(str(args.output), mode="w") | ||
| out_stream = out_container.add_stream("ffv1", rate=20) |
There was a problem hiding this comment.
The output stream rate is hard-coded to 20 fps (add_stream(..., rate=20)). If the input file has a different FPS/time base, this will rewrite timing and can change frame pacing. Prefer deriving the rate from the input stream (e.g., in_stream.average_rate) or explicitly copying timing metadata when writing the intermediate file.
| out_container = av.open(str(args.output), mode="w") | |
| out_stream = out_container.add_stream("ffv1", rate=20) | |
| output_rate = in_stream.average_rate or in_stream.base_rate or 20 | |
| out_container = av.open(str(args.output), mode="w") | |
| out_stream = out_container.add_stream("ffv1", rate=output_rate) |
| def build_mask(frame_idx: int, width: int, height: int, feather_radius: int) -> torch.Tensor: | ||
| img = Image.new("L", (width, height), 0) | ||
| draw = ImageDraw.Draw(img) | ||
| draw.polygon(segment_polygon(frame_idx, width, height), fill=255) | ||
| if feather_radius > 0: | ||
| img = img.filter(ImageFilter.GaussianBlur(radius=feather_radius)) | ||
| mask = torch.frombuffer(memoryview(img.tobytes()), dtype=torch.uint8).clone().view(height, width).float() / 255.0 | ||
| return mask.unsqueeze(0).unsqueeze(0) |
There was a problem hiding this comment.
build_mask constructs and Gaussian-blurs a full-resolution PIL image for every frame. For 1200-frame videos this is a significant CPU cost during compression. Since the mask only changes across a few frame ranges, consider caching the blurred mask per segment (or precomputing per frame_idx) and reusing it instead of rebuilding it every frame.
|
Closing to revise — will reopen with improved score |
submission name
roi_v2
upload zipped
archive.ziparchive.zip
report.txt
does your submission require gpu for evaluation (inflation)?
no
did you include the compression script? and want it to be merged?
yes
additional comments
SVT-AV1 v2.3.0 bundled (
lib/libSvtAv1Enc.so.2.3.0) with a customffmpeg-newbinary.compress.shsetsLD_LIBRARY_PATHto use the bundled v2.3.0 library on CI.Same ROI-aware preprocessing as PR #31 (denoise outside driving corridor, blend=0.50, feather=24).
Frames downscaled 45% Lanczos before encoding, decoded with Lanczos upscale + 9-tap binomial USM at 40%.