Fix SDFITS index file writing to match sparrow format#1047
Draft
tchamberlin wants to merge 11 commits into
Draft
Fix SDFITS index file writing to match sparrow format#1047tchamberlin wants to merge 11 commits into
tchamberlin wants to merge 11 commits into
Conversation
The write_index() output was incompatible with GBTIDL because column names, header padding, format widths, and derived column computations all differed from the sparrow3 reference implementation. Changes: - Generate column header dynamically using sparrow3's exact format specs (EXTENSION→EXT, POLARIZATION→POL, etc. via %N.Ns truncation) - Pad header lines to 256 chars (was 200) matching sparrow3 - Use sparrow3's exact row format strings (%6d, %16.9e, %22.22s, etc.) - Implement skip-spacing for #INDEX# and ROW columns at 1M boundary - Compute CENTFREQ from WCS params (CRVAL1/CRPIX1/CDELT1/NUMCHN) instead of using CRVAL1 directly - Derive POLARIZATION from CRVAL4, PROCEDURE from OBSMODE - Map dysh/FITS column names (HDU→EXTENSION, ELEVATIO→ELEVATION, etc.) - Port 3 sparrow3 IndexWriterTests with character-for-character comparison against .index.expected reference files - Un-skip the roundtrip test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test writing index files with heterogeneous multi-bank VEGAS data: - Synthetic 16-row test with 4 banks, 2 scans, 2 polarizations, varying FILE/EXTENSION/CRVAL4/OBSMODE/CENTFREQ across rows - Real VEGAS data roundtrip using AGBT18B_354_03 (4 banks, 128 rows) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds character-for-character verification against sparrow3's test.banks.vegas.raw.ints.index.expected for 8 VEGAS bank files (A-H) with varying FILE, POL, SAMPLER, FEED, IFNUM, coordinates, CENTFREQ, CAL, and EXPOSURE across 32 rows. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Collaborator
|
I like what you have done here. A few comments:
|
mpound
approved these changes
Mar 12, 2026
… multi-file naming Addresses feedback on PR #1047: integrate index file writing into the SDFITS write path and change multi-file naming to match GBTIDL/sparrow3 conventions. - Add write_index=False parameter to write(). When True, writes sparrow3-compatible .index files alongside each output FITS file. For multifile writes with >1 file, also writes a parent directory index that aggregates all per-file indices. - Change multi-file naming from numeric (0, 1, 2) to alphabetic (A, B, C) with dot separator: output.A.fits, output.B.fits, etc. This matches the GBTIDL/VEGAS convention and produces correct index filenames via get_index_path() (output.A.index, etc.). - Add helper methods: _build_index_metadata(), _build_and_write_index(), _build_and_write_parent_index(), _multifile_name() - Add 8 new tests in TestWriteIndex covering alphabetic naming, single file index, multifile indices, parent index, selection filtering, non-multifile mode, roundtrip, and row count verification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add regression tests comparing dysh's index output against reference implementations: GBTIDL regression (TestGBTIDLRegression): - test_cal_vegas_matches_gbtidl: TGBT22A_503_02 calibration data - test_raw_vegas_a6_matches_gbtidl: AGBT20B_014_03 raw VEGAS data - test_multibank_vegas_matches_gbtidl: AGBT22A_325_15 banks A and B - test_gbtidl_getps_output_matches: column header format comparison Compares integer columns (exact), string columns (exact), float columns (rtol=1e-6), and DATEOBS date prefix. Known exclusions documented: FILE (output vs original name), DATEOBS (GBTIDL truncates), WCALPOS (CALPOSITION not always in dysh index). Sparrow3 extended (TestSparrow3Extended) - character-for-character: - test_vegas_raw_ints_matches_sparrow3 (4 rows) - test_acs_raw_ints_matches_sparrow3 (4 rows) - test_vegas_raw_only_matches_sparrow3 (224 rows) - test_vegas_raw_badif_matches_sparrow3 (384 rows) - test_banks_vegas_raw_badif_matches_sparrow3 (384 rows) Also fix: derive NUMCHN from TDIM7 in _prepare_for_writing() when NUMCHN column is missing (common when index is built from FITS data). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The SDFITS index format uses WCALPOS (originally W-band calibration position) but sparrow/sparrow3/GBTIDL all populate it from the generic FITS column CALPOSITION for all backends. Add the mapping in both SDFITS_INDEX_TO_DYSH_MAP and the _prepare_for_writing rename_map so dysh correctly writes this column. Also re-enable WCALPOS comparison in the GBTIDL regression tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move NUMCHN derivation from TDIM7 out of _prepare_for_writing() and into SDFITSLoad.create_index() so the in-memory index always has NUMCHN available, not just during index file writing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Member
Author
|
I think we are now at feature parity with SDFITS and GBTIDL's FITS writing capabilities, with a few exceptions:
As usual I don't have time to do a full deep dive into this code, but we do have reg tests here against GBTIDL and SDFITS |
astrofle
reviewed
Mar 26, 2026
Collaborator
|
The default behavior should be to write the index file. However, changing the default to write index files also breaks a number of other tests (I have only checked the tests in the fits submodule):
I can try to fix these when I find the time. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
write_index()produced.indexfiles that were incompatible with GBTIDL because the column names, header padding, format widths, and derived column computations all diverged from thesparrow3reference implementation (IndexWriter.py).What changed
Format fixes (verified character-for-character against sparrow3 .index.expected files):
EXTENSIONandPOLARIZATIONare truncated to EXT/POLDATEOBS/TIMESTAMPwidth corrected from 21 to 22 chars#INDEX#andROWcolumns at the 1M boundary (not sure what this is, but there's code for it in sparrow...)Derived column stuff (port sparrow3's
translateInfo):CENTFREQis now computed from WCS parameters((centerChan - CRPIX1) * CDELT1 + CRVAL1)instead of usingCRVAL1directlyPOLARIZATIONderived fromCRVAL4(I/Q/U/V/RR/LL/XX/YY/etc.)PROCEDUREextracted fromOBSMODEFREQINTfromCDELT1,BANDWIDTHfromBANDWID,SUBREFfromSUBREF_STATENSAVEdefaults to -1 (was 0)SIG/CALproperly converted toT/Fstrings from booleansNo changes to the read path at all.
Tests
test_write_and_read_roundtrip) is now passing*.index.expectedreference