Raise FileTypeError for WAVE_FORMAT_MPEGLAYER3 WAV files#105
Raise FileTypeError for WAVE_FORMAT_MPEGLAYER3 WAV files#105snejus merged 8 commits intobeetbox:masterfrom
Conversation
WAV files containing an MP3 stream (wFormatTag=0x0055) are silently mishandled by mutagen, returning incorrect duration, bitrate, and format metadata. Detect this subtype via the audio_format attribute and raise FileTypeError with a clear message instead. Adds a test fixture and regression test. Ref: beetbox/beets#6455
|
Thank you for the PR! The changelog has not been updated, so here is a friendly reminder to check if you need to add an entry. |
|
It would be nice to blacklist other formats that beets can't work with. I could see mu-law being a problem (albeit rare) for example. |
In addition to WAVE_FORMAT_MPEGLAYER3 (0x0055), also reject WAVE_FORMAT_ADPCM (0x0002), WAVE_FORMAT_ALAW (0x0006), and WAVE_FORMAT_MULAW (0x0007).
|
Thank you for the suggestion! I've updated the PR to also blacklist WAVE_FORMAT_ADPCM (0x0002), WAVE_FORMAT_ALAW (0x0006), and WAVE_FORMAT_MULAW (0x0007), which all show similar issues with mutagen. I verified that IEEE float (0x0003) and WAVE_FORMAT_EXTENSIBLE (0xFFFE) work correctly and don't need to be blacklisted. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #105 +/- ##
==========================================
+ Coverage 92.78% 93.02% +0.24%
==========================================
Files 16 16
Lines 832 832
Branches 117 117
==========================================
+ Hits 772 774 +2
+ Misses 41 40 -1
+ Partials 19 18 -1 |
snejus
left a comment
There was a problem hiding this comment.
Looks good to me, thanks. Just fix the issues in CI checks
|
Ah another issue. Just adjust this line and it should be happy: diff --git a/.github/workflows/lint.yaml b/.github/workflows/lint.yaml
index a4169b4..015c3d9 100644
--- a/.github/workflows/lint.yaml
+++ b/.github/workflows/lint.yaml
@@ -106,7 +106,7 @@ jobs:
cache: poetry
- name: Install dependencies
- run: poetry install --only=lint
+ run: poetry install
- name: Type check code
uses: liskin/gh-problem-matcher-wrap@v3 |
Fixes #6455 WAV files can contain MP3 audio streams (WAVE_FORMAT_MPEGLAYER3, wFormatTag=0x0055), but beets has no way to tag them correctly and would silently import them with wrong duration, bitrate, and format metadata. This PR fixes the issue by automatically detecting such files during import and using ffmpeg to remux them into proper .mp3 files. The original WAV is deleted after successful remuxing, and the MP3 is imported normally with correct metadata. This fix depends on beetbox/mediafile#105 which raises a clear FileTypeError for these files, allowing beets to detect and handle them.
|
Hi, it is a bit unfortunate that mediafile now throws an exception just because mutagen cannot correctly work with these file types. mediafile is a separate python module on pypi and I am using it for a project separately from beets. Maybe others are also using it like this. |
|
mediafile already throws exceptions for unsupported files (i.e. files it cannot read tags from) - is there a use case you have for using mediafile on MP3-in-WAV files that isn't managing tags? |
|
In addition to tags I am using MediaFile to get some basic attributes of audio files, e.g. channels, samplerate, length, bitrate. E.g. MediaFile 0.16.0 would provide the following information for this wav file https://www.mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/Samples/AFsp/M1F1-mulaw-AFsp.wav MediaFile in version 0.16.1 throws an exception: "File type 'WAVE_FORMAT_MULAW' is not supported" |
|
@elainec2024 does beets really need this functionality or could this logic be moved to beets by any chance? |
|
I just tested this approach locally by calling remux_mpeglayer3_wav in read_item (in beets.importer.tasks) before attempting to read metadata, and beets can handle the detection without depending on mediafile raising FileTypeError. All tests pass with this approach, but the tradeoff is that remux_mpeglayer3_wav now runs on every imported file rather than only on files that fail. However, the overhead is minimal because it just opens the file with mutagen and checks one attribute. |
|
Sounds good. Would you mind submitting a PR for this? I will revert this PR once we merge that :) |
|
Actually, maybe a better solution would be raise this error optionally, @elainec2024? Do not raise by default, but have |
Following up on #105, this adds a `raise_on_unsupported_wav` parameter to `MediaFile.__init__` (defaults to `False`) to make the `FileTypeError` for non-PCM WAV formats optional rather than mandatory. This allows other mediafile users to continue reading basic audio properties (channels, samplerate, length, bitrate) from non-PCM WAV files, while beets can opt in to the behavior by passing `raise_on_unsupported_wav=True`. Addresses the concern raised in #105.
When a WAV file contains an MP3 stream (wFormatTag=0x0055, WAVE_FORMAT_MPEGLAYER3), mutagen opens it as a 'WAVE' object but cannot read or write its tags correctly. This causes MediaFile to return the wrong metadata - incorrect duration, zero bitrate, and format reported as "WAVE" instead of "MP3."
This was reported in beetbox/beets#6455.
The fix was to detect the WAVE_FORMAT_MPEGLAYER3 subtype using mutagen's 'audio_format' attribute on the WAV info object and raise 'FileTypeError' with a clear message.
Adds a test fixture and regression test.