ENH: Validate PET data objects' attributes at instantiation by jhlegarreta · Pull Request #336 · nipreps/nifreeze

jhlegarreta · 2025-11-21T03:05:34Z

Validate PET data objects' attributes at instantiation: ensures that the attributes are present and match the expected dimensionalities.

PET class attributes
Refactor the PET attributes so that only midframe and total_duration are required and accepted by the constructor. These are the only parameters that are required by the current PET model.

Remove uptake from the constructor: the PET data class does not need to know the uptake values held across its frames; it is rather the estimator that needs to know about its values so that the iterator can pick the frames following the appropriate sorting.

Validate and format attributes so to avoid missing or inconsistent data. Specifically, require the midframe data to have the same length as the number of frames in the data object, and disallow the last midframe value being larger than the total duration.

Make the _compute_uptake_statistic public so that users can call it.

from_nii function:
Refactor the from_nii function to accept filenames instead of a mix of filenames (e.g. the PET image sequence and brainmask) and temporal attribute arrays. Honors the name of the function, increases consistency with the dMRI counterpart and allows to offer a uniform API.

The only required temporal parameter required by BIDS is the frame time (FrameTimesStart). Thus, the temporal attribute JSON (sidecar) file is required to contain that key. The values required to model a PET dataset for the purposes of NiFreeze, namely the midframe and total duration values, are computed from the frame time. It is assumed that the frame duration spans entirely the time elapsed between two consecutive time frame values.

Refactor and rename the _compute_frame_duration function so that it computes and returns the required parameters to instantiate a PET data object. The computation of the relevant temporal values is, thus, done at this place only.

Use the get_data utils function in from_nii to handle automatically the data type when loading the PET data.

to_nifti function
Preserve the base class to_nifti method to serialize the PET dataset to NIfTI data. The PET dataset class does not need to write its temporal attributes, as they do not change along the prediction process, and they can be computed from the primary JSON file where FrameTimesStart data dwell. NiFreeze will still allow writing a BIDS-compatible derivative dataset by only writing the motion-corrected PET frames, and users would read the FrameTimesStart data from the primary JSON file.

PET.load class method:
Remove the PET.load class method and rely on the data.__init__.load function:

If an HDF5 filename is provided, it is assumed that it hosts all necessary information, and the data module load function should take of loading all data.
If the provided arguments are NIfTI files plus other data files, the function will call the pet.PET.from_nii function.

Change the kwargs arguments to be able to identify the relevant keyword arguments that are now present in the from_nii function.

Change accordingly the PET.load(pet_file, json_file) call in the PET notebook and the test_pet_load test function.

Tests:
Refactor the PET data creation fixture in conftest.py to accept the frame_time (as it is the only required arguments by BIDS and the one that allows computing the rest) and to return the necessary data.

Remove values that are no longer needed (i.e. total_duration).

Refactor the tests accordingly and increase consistency with the dmri data module testing helper functions. Reduces cognitive load and maintenance burden.

Add additional object instantiation equality checks: check that objects instantiated through reading NIfTI files equal objects instantiated directly.

Check the PET dataset attributes systematically in round trip tests by collecting all named attributes that need to be tested.

Modify accordingly the PET model and integration tests.

Take advantage of the patch set to make other opinionated choices:

Prefer using the global setup_random_pet_data fixture over the local random_dataset fixture: it allows to control the parameters of the generated data and increases consistency with the practice adopted across the dMRI dataset tests. Remove the random_dataset fixture.
Prefer using assert np.allclose over np.testing.assert_array_equal for the sake of consistency

Dependencies
Require attrs>24.1.0 so that attrs.Converter can be used.
Documentation:
https://www.attrs.org/en/25.4.0/api.html#converters

jhlegarreta · 2025-11-21T03:07:45Z

Depends on PR #335.

jhlegarreta · 2025-11-22T17:07:31Z

@mnoergaard While working on this I've realized that as things stand now on main, there is a risk that two instances of a PET object contain different data if instantiated directly (i.e. PET()) or from a NIfTI file (i.e. from_nii). So, I would like to have your confirm on the following:

To build a valid PET instance, the only required piece of information are the data sequence and frame_time, as uptake can be computed from the former, and midframe and total_duration can all be computed from the latter.

I think that answer will allow me to refactor this properly and avoid inconsistencies. Thanks.

mnoergaard · 2025-11-22T17:18:53Z

@mnoergaard While working on this I've realized that as things stand now on main, there is a risk that two instances of a PET object contain different data if instantiated directly (i.e. PET()) or from a NIfTI file (i.e. from_nii). So, I would like to have your confirm on the following:

To build a valid PET instance, the only required piece of information are the data sequence and frame_time, as uptake can be computed from the former, and midframe and total_duration can all be computed from the latter.

I think that answer will allow me to refactor this properly and avoid inconsistencies. Thanks.

@jhlegarreta - that is correct! Thanks.

jhlegarreta · 2025-11-22T20:42:04Z

@mnoergaard Please, see if the refactoring for the PET data class instantiation and from_nii function make sense. Before going in the suggested direction and fixing the remaining tests I would like to confirm this. Thanks.

jhlegarreta · 2025-11-23T16:03:59Z

Pending to do an additional refactoring (in a separate commit) to stick to the convention adopted for dMRI data to split the nifreeze.data.pet.py contents across nifreeze.data.pet.base.py, nifreeze.data.pet.io.py and nifreeze.data.pet.utils.py modules after #336 (comment) is resolved and tests get fixed.

jhlegarreta · 2025-11-25T04:18:06Z

Re #336 (comment) as of commit 37ed54c and the init=False, I dug into this a little more:

I see that to_filename dumps the PET class instance entirely to the HDF5 files, including its private attributes (i.e. those that can be computed from the data: midframe, total_duration, etc.) Thus, when a PET object is tried to be read and instantiated from the HDF5 contents, as it is given all these data, including the private attributes, it fails.
Due to the way uptake is computed (using a callable that is not stored anywhere), and since it does not make sense to instantiate a PET object to have the private attributes computed to immediately after set them to other values, best would be to allow all of them to be present in the constructor, falling back to the default way of computing them if not present.
I see that that load class method is taking a JSON file with the frame duration and frame time start data. The frame time start looks like it should be an attribute of the class that can be given at instantiation, and be defaulted to
```
frame_time_arr = np.array(self.frame_time, dtype=np.float32)
frame_time_arr -= frame_time_arr[0]
```
if not given.

Also, it should be possible to host and read midframe and total duration data from a JSON file (or any other file), much like it is done with the frame duration data, falling back to the default way of computing them if not present.

Refactoring things this way (i.e. allowing to provide all attributes to the PET class, and the from_nii function, and falling back to defaults if not given) would probably allow us to solve all failing tests that persist.

WDTY @mnoergaard?

Sorry for so many questions. Hopefully we will converge and the implementation across the multiple ways to read/write data will be consistent and robust after this.

jhlegarreta · 2025-11-27T15:45:11Z

Comments:

There is a failing test in the estimator. When the PET.lofo_split function is called it returns the midframe data:
https://github.com/jhlegarreta/nifreeze/blob/bbe0ca0e6c7edcc8f8f604fdacb75beb6ca112ac/src/nifreeze/estimator.py#L248
https://github.com/jhlegarreta/nifreeze/blob/bbe0ca0e6c7edcc8f8f604fdacb75beb6ca112ac/src/nifreeze/estimator.py#L256
However, the PET instantiation is provided all parameters, which are not masked, and thus the dimensionality validation for frame_time fails as the index frame is removed by the LOO strategy. frame_time is now a required argument. Omitting the optional argument would make such that the PET instance would automatically compute them based on the masked data, which would yield wrong information. Making lofo_split return all temporal/uptake data so that a correct PET object can be instantiated is another option.

The typechecks are also failing because midframe is allowed to be None, but it is the feature required by the model. e.g.

nifreeze/src/nifreeze/data/pet.py

Lines 56 to 57 in 62e5e43

    
           def _getextra(self, idx: int | slice | tuple | np.ndarray) -> tuple[np.ndarray]: 
        
               return (self.midframe[idx],)

errors.

So, we need to discuss:

We can make such that the PET object only contains midframe and total_duration data, which are the only data required for the current PET model to be instantiated:
https://github.com/jhlegarreta/nifreeze/blob/bbe0ca0e6c7edcc8f8f604fdacb75beb6ca112ac/src/nifreeze/estimator.py#L267

This would mean that from_nii should take care of computing the attributes based exclusively the frame_time:
https://github.com/jhlegarreta/nifreeze/blob/bbe0ca0e6c7edcc8f8f604fdacb75beb6ca112ac/src/nifreeze/data/pet.py#L345-L359

The uptake value would no longer be present as an attribute. This would also make such that the HDF5 file would only contain midframe and total_duration data, and we would not be able to reproduce the original frame_time values.
Whether the lofo_split function is redundant and/or adds an unnecessary overhead by writing data to an HDF5 file and reading it back, as everything that is required for the masking
```
 mask = np.ones(self.dataobj.shape[-1], dtype=bool)
 mask[index] = False
```
is the dataobj shape, which can be accessed through pet_dataset present in the estimator.
By exposing all attributes, a user could instantiate a PET object that does not contain correct information (e.g. has provided midframe values that are clearly wrong). So, probably, making the PET class expose all attributes was maybe not a good idea.

We probably want to override the base class to_nifti function so that we serialize the temporal data at least (uptake data depends on the above discussion) to a JSON file, so that we provide a uniform API (i.e. from_nii requires now temporal data read from a JSON file), and consistent across the tool.

jhlegarreta · 2025-12-06T15:54:32Z

Marking this as a draft: commits 747b002 and 387dfe6 show that the difficulties to have a robust PET data instantiation (i.e. not requiring more than what we need while trying to disallow means to set parameter values that can be wrong or inconsistent across time arrays), having a consistent API across modalities and being BIDS-compatible when serializing/reading the data required by the class).

I think the PET estimator (PR #203) needs to be fixed so that this is not necessary:

nifreeze/src/nifreeze/estimator.py

Lines 256 to 261 in f892e52

    
           train_dataset = PET( 
        
               dataobj=train_data, 
        
               affine=pet_dataset.affine, 
        
               brainmask=pet_dataset.brainmask, 
        
               midframe=train_times, 
        
               total_duration=pet_dataset.total_duration,

We should not need to do that instantiation, which conditions in important ways the PET data class refactoring.

Commit dd3738a shows why frame_time would be required to serialize a BIDS-valid dataset, and the need to reconstruct it from midframe and total_duration.

I have explored a lot of other avenues to try to have this sorted and none is robust. PRs #203 and #204 need to be worked on and merged before more time is invested in this PR.

Validate PET data objects' attributes at instantiation: ensures that the attributes are present and match the expected dimensionalities. **PET class attributes** Refactor the PET attributes so that only `midframe` and `total_duration` are required and accepted by the constructor. These are the only parameters that are required by the current PET model. Remove `uptake` from the constructor: the PET data class does not need to know the uptake values held across its frames; it is rather the estimator that needs to know about its values so that the iterator can pick the frames following the appropriate sorting. Validate and format attributes so to avoid missing or inconsistent data. Specifically, require the midframe data to have the same length as the number of frames in the data object, and disallow the last midframe value being larger than the total duration. Make the `_compute_uptake_statistic` public so that users can call it. **`from_nii`** function: Refactor the `from_nii` function to accept filenames instead of a mix of filenames (e.g. the PET image sequence and brainmask) and temporal attribute arrays. Honors the name of the function, increases consistency with the dMRI counterpart and allows to offer a uniform API. The only required temporal parameter required by BIDS is the frame time (`FrameTimesStart`). Thus, the temporal attribute JSON (sidecar) file is required to contain that key. The values required to model a PET dataset for the purposes of NiFreeze, namely the midframe and total duration values, are computed from the frame time. It is assumed that the frame duration spans entirely the time elapsed between two consecutive time frame values. Refactor and rename the `_compute_frame_duration` function so that it computes and returns the required parameters to instantiate a PET data object. The computation of the relevant temporal values is, thus, done at this place only. Use the `get_data` utils function in `from_nii` to handle automatically the data type when loading the PET data. **`to_nifti`** function Preserve the base class `to_nifti` method to serialize the `PET` dataset to NIfTI data. The `PET` dataset class does not need to write its temporal attributes, as they do not change along the prediction process, and they can be computed from the primary JSON file where `FrameTimesStart` data dwell. NiFreeze will still allow writing a BIDS-compatible derivative dataset by only writing the motion-corrected PET frames, and users would read the `FrameTimesStart` data from the primary JSON file. **`PET.load`** class method: Remove the `PET.load` class method and rely on the `data.__init__.load` function: - If an HDF5 filename is provided, it is assumed that it hosts all necessary information, and the data module `load` function should take of loading all data. - If the provided arguments are NIfTI files plus other data files, the function will call the `pet.PET.from_nii` function. Change the `kwargs` arguments to be able to identify the relevant keyword arguments that are now present in the `from_nii` function. Change accordingly the `PET.load(pet_file, json_file)` call in the PET notebook and the `test_pet_load` test function. **Tests**: Refactor the PET data creation fixture in `conftest.py` to accept the `frame_time` (as it is the only required arguments by BIDS and the one that allows computing the rest) and to return the necessary data. Remove values that are no longer needed (i.e. `total_duration`). Refactor the tests accordingly and increase consistency with the `dmri` data module testing helper functions. Reduces cognitive load and maintenance burden. Add additional object instantiation equality checks: check that objects instantiated through reading NIfTI files equal objects instantiated directly. Check the PET dataset attributes systematically in round trip tests by collecting all named attributes that need to be tested. Modify accordingly the PET model and integration tests. Take advantage of the patch set to make other opinionated choices: - Prefer using the global `setup_random_pet_data` fixture over the local `random_dataset` fixture: it allows to control the parameters of the generated data and increases consistency with the practice adopted across the dMRI dataset tests. Remove the `random_dataset` fixture. - Prefer using `assert np.allclose` over `np.testing.assert_array_equal` for the sake of consistency **Dependencies** Require `attrs>24.1.0` so that `attrs.Converter` can be used. Documentation: https://www.attrs.org/en/25.4.0/api.html#converters

codecov · 2025-12-11T01:37:22Z

Codecov Report

❌ Patch coverage is 89.65517% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.92%. Comparing base (240b2fa) to head (a716649).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
src/nifreeze/data/pet.py	89.53%	7 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #336      +/-   ##
==========================================
+ Coverage   81.81%   81.92%   +0.10%     
==========================================
  Files          34       34              
  Lines        1980     2019      +39     
  Branches      211      223      +12     
==========================================
+ Hits         1620     1654      +34     
- Misses        312      318       +6     
+ Partials       48       47       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jhlegarreta · 2025-12-11T01:38:12Z

I modified this following our conversation. The current PET model and estimator remain unchanged. I am going to go ahead and merge this to move forward.

jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 2 times, most recently from e094509 to a548cbc Compare November 22, 2025 01:08

jhlegarreta mentioned this pull request Nov 22, 2025

Ensure the dataset mandatory attributes are present in data instantiation #302

Closed

jhlegarreta force-pushed the enh/validate-pet-data-attrs branch from a548cbc to 96e916b Compare November 22, 2025 16:47

jhlegarreta linked an issue Nov 22, 2025 that may be closed by this pull request

Ensure the dataset mandatory attributes are present in data instantiation #302

Closed

jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 2 times, most recently from 745e408 to bc7617e Compare November 22, 2025 20:41

jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 3 times, most recently from 803b020 to 1eae4b2 Compare November 23, 2025 16:00

jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 2 times, most recently from ffe563c to 37ed54c Compare November 24, 2025 01:50

jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 3 times, most recently from 6751a06 to bbe0ca0 Compare November 27, 2025 15:27

jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 8 times, most recently from e7a03a1 to 747b002 Compare November 29, 2025 17:08

jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 2 times, most recently from 9f1a2bc to 387dfe6 Compare December 6, 2025 15:40

jhlegarreta marked this pull request as draft December 6, 2025 15:54

jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 6 times, most recently from 9f9ed71 to 43b99af Compare December 11, 2025 01:24

jhlegarreta force-pushed the enh/validate-pet-data-attrs branch from 43b99af to a716649 Compare December 11, 2025 01:25

jhlegarreta marked this pull request as ready for review December 11, 2025 01:36

jhlegarreta merged commit d0f2b62 into nipreps:main Dec 11, 2025
10 checks passed

jhlegarreta deleted the enh/validate-pet-data-attrs branch December 11, 2025 01:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Validate PET data objects' attributes at instantiation#336

ENH: Validate PET data objects' attributes at instantiation#336
jhlegarreta merged 1 commit into
nipreps:mainfrom
jhlegarreta:enh/validate-pet-data-attrs

jhlegarreta commented Nov 21, 2025 •

edited

Loading

Uh oh!

jhlegarreta commented Nov 21, 2025

Uh oh!

jhlegarreta commented Nov 22, 2025

Uh oh!

mnoergaard commented Nov 22, 2025

Uh oh!

jhlegarreta commented Nov 22, 2025 •

edited

Loading

Uh oh!

jhlegarreta commented Nov 23, 2025

Uh oh!

jhlegarreta commented Nov 25, 2025

Uh oh!

jhlegarreta commented Nov 27, 2025 •

edited

Loading

Uh oh!

jhlegarreta commented Dec 6, 2025

Uh oh!

codecov Bot commented Dec 11, 2025 •

edited

Loading

Uh oh!

jhlegarreta commented Dec 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jhlegarreta commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhlegarreta commented Nov 21, 2025

Uh oh!

jhlegarreta commented Nov 22, 2025

Uh oh!

mnoergaard commented Nov 22, 2025

Uh oh!

jhlegarreta commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhlegarreta commented Nov 23, 2025

Uh oh!

jhlegarreta commented Nov 25, 2025

Uh oh!

jhlegarreta commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhlegarreta commented Dec 6, 2025

Uh oh!

codecov Bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jhlegarreta commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jhlegarreta commented Nov 21, 2025 •

edited

Loading

jhlegarreta commented Nov 22, 2025 •

edited

Loading

jhlegarreta commented Nov 27, 2025 •

edited

Loading

codecov Bot commented Dec 11, 2025 •

edited

Loading

jhlegarreta commented Dec 11, 2025 •

edited

Loading