Skip to content

Fix SDFITS index file writing to match sparrow format#1047

Draft
tchamberlin wants to merge 11 commits into
release-1.1from
write_sdfits_index
Draft

Fix SDFITS index file writing to match sparrow format#1047
tchamberlin wants to merge 11 commits into
release-1.1from
write_sdfits_index

Conversation

@tchamberlin

@tchamberlin tchamberlin commented Mar 12, 2026

Copy link
Copy Markdown
Member

write_index() produced .index files that were incompatible with GBTIDL because the column names, header padding, format widths, and derived column computations all diverged from the sparrow3 reference implementation (IndexWriter.py).

What changed

Format fixes (verified character-for-character against sparrow3 .index.expected files):

  • Column header row is now generated dynamically from sparrow3's format specs
  • names like EXTENSION and POLARIZATION are truncated to EXT/POL
  • Header line padding changed from 200 to 256 chars (based on sparrow code)
  • Row format strings match sparrow3 exactly
  • DATEOBS/TIMESTAMP width corrected from 21 to 22 chars
  • Skip-spacing logic for #INDEX# and ROW columns at the 1M boundary (not sure what this is, but there's code for it in sparrow...)

Derived column stuff (port sparrow3's translateInfo):

  • CENTFREQ is now computed from WCS parameters ((centerChan - CRPIX1) * CDELT1 + CRVAL1) instead of using CRVAL1 directly
  • POLARIZATION derived from CRVAL4 (I/Q/U/V/RR/LL/XX/YY/etc.)
  • PROCEDURE extracted from OBSMODE
  • FREQINT from CDELT1, BANDWIDTH from BANDWID, SUBREF from SUBREF_STATE
  • NSAVE defaults to -1 (was 0)
  • SIG/CAL properly converted to T/F strings from booleans

No changes to the read path at all.

Tests

  • The previously-skipped roundtrip test (test_write_and_read_roundtrip) is now passing
  • Made some dysh-specific tests
  • Ported 3 sparrow3 IndexWriterTests that comparing data rows character-for-character against *.index.expected reference

The write_index() output was incompatible with GBTIDL because column
names, header padding, format widths, and derived column computations
all differed from the sparrow3 reference implementation.

Changes:
- Generate column header dynamically using sparrow3's exact format specs
  (EXTENSION→EXT, POLARIZATION→POL, etc. via %N.Ns truncation)
- Pad header lines to 256 chars (was 200) matching sparrow3
- Use sparrow3's exact row format strings (%6d, %16.9e, %22.22s, etc.)
- Implement skip-spacing for #INDEX# and ROW columns at 1M boundary
- Compute CENTFREQ from WCS params (CRVAL1/CRPIX1/CDELT1/NUMCHN)
  instead of using CRVAL1 directly
- Derive POLARIZATION from CRVAL4, PROCEDURE from OBSMODE
- Map dysh/FITS column names (HDU→EXTENSION, ELEVATIO→ELEVATION, etc.)
- Port 3 sparrow3 IndexWriterTests with character-for-character
  comparison against .index.expected reference files
- Un-skip the roundtrip test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@tchamberlin tchamberlin self-assigned this Mar 12, 2026
@tchamberlin tchamberlin requested a review from mpound March 12, 2026 19:03
tchamberlin and others added 2 commits March 12, 2026 19:27
Test writing index files with heterogeneous multi-bank VEGAS data:
- Synthetic 16-row test with 4 banks, 2 scans, 2 polarizations,
  varying FILE/EXTENSION/CRVAL4/OBSMODE/CENTFREQ across rows
- Real VEGAS data roundtrip using AGBT18B_354_03 (4 banks, 128 rows)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds character-for-character verification against sparrow3's
test.banks.vegas.raw.ints.index.expected for 8 VEGAS bank files
(A-H) with varying FILE, POL, SAMPLER, FEED, IFNUM, coordinates,
CENTFREQ, CAL, and EXPOSURE across 32 rows.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mpound

mpound commented Mar 12, 2026

Copy link
Copy Markdown
Collaborator

I like what you have done here. A few comments:

  • Your solution for multi-file indices writes a single index file with different "FILE" values for each of the FITS files. This is different than GBTIDL which writes one index per FITS file, e.g. A.index, B.index, etc.
  • The missing piece for dysh capability is that GBTFITSLoad/SDFITSLoad.write() would need a write_index boolean to invoke this. This would require a loop over write_index() passing in the SDFITSLoad.index for each FITS file. And test code to make sure it works correctly. At that point I'd also change dysh multifits write to use A, B, C instead of 0, 1, 2, 3...a little more bookkeeping.
  • I'm inclined to accept this to get it into 1.0.0 so the base capability is there and then between now and panel review, implement sdfits.write(index_file=T/F)

tchamberlin and others added 5 commits March 18, 2026 13:37
… multi-file naming

Addresses feedback on PR #1047: integrate index file writing into the
SDFITS write path and change multi-file naming to match GBTIDL/sparrow3
conventions.

- Add write_index=False parameter to write(). When True, writes
  sparrow3-compatible .index files alongside each output FITS file.
  For multifile writes with >1 file, also writes a parent directory
  index that aggregates all per-file indices.
- Change multi-file naming from numeric (0, 1, 2) to alphabetic
  (A, B, C) with dot separator: output.A.fits, output.B.fits, etc.
  This matches the GBTIDL/VEGAS convention and produces correct
  index filenames via get_index_path() (output.A.index, etc.).
- Add helper methods: _build_index_metadata(), _build_and_write_index(),
  _build_and_write_parent_index(), _multifile_name()
- Add 8 new tests in TestWriteIndex covering alphabetic naming, single
  file index, multifile indices, parent index, selection filtering,
  non-multifile mode, roundtrip, and row count verification.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add regression tests comparing dysh's index output against reference
implementations:

GBTIDL regression (TestGBTIDLRegression):
- test_cal_vegas_matches_gbtidl: TGBT22A_503_02 calibration data
- test_raw_vegas_a6_matches_gbtidl: AGBT20B_014_03 raw VEGAS data
- test_multibank_vegas_matches_gbtidl: AGBT22A_325_15 banks A and B
- test_gbtidl_getps_output_matches: column header format comparison

Compares integer columns (exact), string columns (exact), float columns
(rtol=1e-6), and DATEOBS date prefix. Known exclusions documented:
FILE (output vs original name), DATEOBS (GBTIDL truncates), WCALPOS
(CALPOSITION not always in dysh index).

Sparrow3 extended (TestSparrow3Extended) - character-for-character:
- test_vegas_raw_ints_matches_sparrow3 (4 rows)
- test_acs_raw_ints_matches_sparrow3 (4 rows)
- test_vegas_raw_only_matches_sparrow3 (224 rows)
- test_vegas_raw_badif_matches_sparrow3 (384 rows)
- test_banks_vegas_raw_badif_matches_sparrow3 (384 rows)

Also fix: derive NUMCHN from TDIM7 in _prepare_for_writing() when
NUMCHN column is missing (common when index is built from FITS data).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The SDFITS index format uses WCALPOS (originally W-band calibration
position) but sparrow/sparrow3/GBTIDL all populate it from the generic
FITS column CALPOSITION for all backends. Add the mapping in both
SDFITS_INDEX_TO_DYSH_MAP and the _prepare_for_writing rename_map so
dysh correctly writes this column.

Also re-enable WCALPOS comparison in the GBTIDL regression tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move NUMCHN derivation from TDIM7 out of _prepare_for_writing() and
into SDFITSLoad.create_index() so the in-memory index always has
NUMCHN available, not just during index file writing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tchamberlin

Copy link
Copy Markdown
Member Author

I think we are now at feature parity with SDFITS and GBTIDL's FITS writing capabilities, with a few exceptions:

  1. GBTIDL seems to have some sort of interactive index writer. fileout/keep stuff
  2. GBTIDL has some sort of spectra tagging system
  3. SDFITS has a daemon that updates the index files in online mode; we don't try to duplicate that here. There's also some concept of lock files that doesn't seem necessary here, but not 100% sure

As usual I don't have time to do a full deep dive into this code, but we do have reg tests here against GBTIDL and SDFITS

@tchamberlin tchamberlin changed the base branch from main to fast_ci_dl March 19, 2026 17:44
Base automatically changed from fast_ci_dl to release-1.1 March 25, 2026 14:19
Comment thread src/dysh/fits/index_file.py Outdated
@astrofle

Copy link
Copy Markdown
Collaborator

The default behavior should be to write the index file. However, changing the default to write index files also breaks a number of other tests (I have only checked the tests in the fits submodule):

  • src/dysh/fits/tests/test_gbtfitsload.py::TestGBTFITSLoad::test_construct_integration_number
  • src/dysh/fits/tests/test_gbtfitsload.py::TestGBTFITSLoad::test_write_all
  • src/dysh/fits/tests/test_gbtfitsload.py::TestGBTFITSLoad::test_write_repeated_scans
  • src/dysh/fits/tests/test_gbtfitsload.py::TestGBTFITSLoad::test_azel_coords
  • src/dysh/fits/tests/test_gbtfitsload.py::TestGBTFITSLoad::test_hadec_coords
  • src/dysh/fits/tests/test_gbtfitsload.py::TestGBTFITSLoad::test_galactic_coords
  • src/dysh/fits/tests/test_gbtfitsload.py::TestWriteIndex::test_write_index_default_false

I can try to fix these when I find the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants