Future GoodRunList generation in pass3

The NuSources and LowEnAstro working groups need to generate good run lists ("GRLs") in order to correctly handle livetime and transients. These are more complex than the GRLs that are currently available via [I3Live's snapshots](https://live.icecube.wisc.edu/snapshots/) tool. 

Now that the filter database has a read-only user (thanks @gh999ic, @mcpreston), I've written [scripts](https://github.com/mjlarson/nusources_dataset_converters/blob/main/grl/create_blank_grl.py) to collate all of this information. For pass3, though, it would be useful if we could build the databases so that these tasks are simpler and we don't need to piece together (or guess at...) so many pieces of information. I wanted to collect these in one place so we can consider the issues when building pass3 databases:

Level2 GRL text files:
- The L2 GRL text files in `/data/exp/` are known to be static and don't reflect the current snapshot
- The L2 GRL text files are sometimes actively wrong: Run 121864 (February 10, 2013) lists 86 active strings in the [text file](https://convey.icecube.wisc.edu/data/exp/IceCube/2012/filtered/level2pass2/IC86_2012_GoodRunInfo_Versioned.txt) (created April 12, 2018) while [I3Live](https://live.icecube.wisc.edu/run/121864/) shows string 31 dropped.
- Users need to know that `level2` pre-2017 is different from `level2pass2` pre-2017 and `level2` post-2017 (which are the same thing, despite different names)
- These never contained information on gaps: to access those, users usually had to access a combination of txt and tar files in `/data/exp/`.
- These never contained information on missing files. Users needed to trawl through L2 i3 files on disk to calculate the start/stop times of the missing files.

I3Live database:
- No information about active doms/strings available in the snapshot. These are needed for veto-style selections since missing strings can break the veto.
- Number of active strings/doms is in a separate url from the `good_i3`, `good_it`, `grl_start`, and `grl_stop` so we need to ping the server twice
- In old runs, `configured_doms` and `grl_stop` are sometimes not set, leading to awkward workarounds.
- Overall run information is available, but access via the json/web interface tends to be slow: retrieving all of the relevant information for a single run takes ~0.7 seconds per run. 

Filtering database:
- Information is spread out across five tables (`gaps`, `gaps_pass2`, `sub_runs`, `sub_runs_pass2`, `missing_files_pass2`)
- `_pass2` tables are only applicable pre-2017, but that relies on user knowledge.
- No absolute times are given: only livetime per subrun file, so users must manually calculate the time of gaps, missing files
- Not all missing files are in the `missing_files_pass2` table. For some cases, this is because a later good start time was set and early subrun files are dropped. In other cases, I can't find any explanation (eg, run 125347 is missing subrun file 254). For these cases, the user has to guess at the livetime of the missing file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Future GoodRunList generation in pass3 #29

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Future GoodRunList generation in pass3 #29

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions