Gen sweep#208
Open
Sidney-Lisanza wants to merge 33 commits into
Open
Conversation
…nd components Co-authored-by: Cursor <[email protected]>
…-ligand pipeline Co-authored-by: Cursor <[email protected]>
…-ligand training - Add max_cluster_replicates parameter to StructureLightningDataModule to cap upsampling of small datasets in balanced training mode - Add data configs: structure_ligand_all (7-dataset combined), PLINDER baseline, distillation, and intermediate configs for protein-ligand training - Fix elif→if in gen_ume protein-ligand model to allow simultaneous IF/FF eval - Fix PDB loading edge cases in latent_generator io - Add structure transforms for protein-ligand data handling
…line eval - Add compute_protein_ligand_contacts and compute_aligned_ligand_rmsd to generation utils as reusable standalone functions - Add contact-based ligand_in_pocket metric to forward folding evaluator: checks if predicted ligand contacts GT pocket residues (replaces centroid-based) - Add ligand_contacts_protein metric (any protein-ligand contact at 6A) - Allow skipping ESMFold in conditioned generation (plm_fold=None) - Add best-of-N display and ligand placement stats to FF cmdline output - Add LigandMPNN inverse folding baseline evaluator and cmdline script - Update inverse folding evaluator with pocket-aware metrics - Update conditioned gen cmdline with additional generation parameters
- Update forward folding and inverse folding callbacks with ligand support - Update hydra callback configs with protein-ligand evaluation parameters - Add save_structures and minimize_ligand options to callback configs
…ct-based ligand placement - Add good_fold_and_in_pocket_fraction (TM > 0.5 AND ligand in correct pocket) to FF evaluator summary and cmdline output - Update merge_cofold_results.py to use contact-based ligand_in_pocket (CA within 6A of GT pocket residues) instead of centroid distance - Add cofold_ligand_contacts_protein and cofold_n_pocket_contacts metrics - Report good_fold_and_in_pocket in merge summary
- Restructure run_full_eval.sh: Phase 2 supports rf3, boltz, or both backends with configurable task selection (COFOLD_TASKS=if,ff,cg,lmpnn) - RF3 co-folding runs in parallel chunks across multiple GPUs - Boltz2 co-folding uses SLURM array jobs (one per sample) - Phase 3 merges co-fold results from either backend - Add benchmark_conditioned_gen.py for Gen-UME vs Proteina-Complexa comparison with ESMFold pre-filtering and per-design timing - Add run_rf3_ff_baseline.py for RF3 co-folding on designed sequences - Add submit_cofold_batch.py and run_cofold_local.py for batch co-folding
… docs - Add ligand-conditioned generation and LigandMPNN baseline sections - Document evaluation pipeline (Phase 1-3) with RF3/Boltz2 co-folding - Document contact-based ligand placement metrics and good_fold_and_in_pocket - Add training data configs and training commands - Document benchmark script for Gen-UME vs Proteina-Complexa - Add best-of-N forward folding and aligned ligand RMSD - Update PoseBusters benchmark description
…ses) Implements the ProteinMPNN-analog AR-MC pseudo-likelihood estimator for Gen-UME (sequence + structure heads), plus best-of-N selection drivers and analysis scripts derived from the PLL signal. Scoring + correlation: - score_gen_ume_pll.py: K-draw stratified-t MC PLL scorer (score_unif / score_arllh / fixed-t variants; per-modality and joint_true_2) - correlate_pll_with_quality.py: pooled + per-length Pearson/Spearman vs task quality CSVs - score_gen_ume_pll_failed_attempts.py / compare_pll_vs_sr_gate.py: PLL scoring of SR-rejected attempts and PLL-vs-SR-gate decision comparison Best-of-N drivers + analyzers: - forward_fold_bestofN_pll.py / analyze_bestofN_ff.py - inverse_fold_bestofN_pll.py / analyze_bestofN_if.py - unconditional_bestofN_pll.py / analyze_bestofN_uc.py - analyze_bestofN_topk_softpick.py: hard-argmin vs top-K soft pick - plot_ff_struc_pll_per_target.py: per-target within-correlation diagnostic - regen_top_K_by_nll_uc.py: replay top-K-by-NLL UC candidates for ESMFold Self-reflection IF: - inverse_fold_self_reflection.py / analyze_if_self_reflection.py - plot_if_sr_jump.py: per-target before/after waterfall + scatter SLURM drivers: - score_gen_ume_pll.sh - run_forward_fold_bestofN_pll.sh - run_unconditional_bestofN_pll.sh Co-authored-by: Cursor <[email protected]>
…tricsCSVWriter GenUME unconditional benchmark vs LaProteina/DPLM2 + Self-Reflection threshold sweep (SR>=0.833 vs SR>=0.9) + SR-QC vs ESMFold-QC concordance analyses. New benchmark scripts: - eval_competitor_unconditional.py: subsample 100 PDBs/length, ESMFold, compute designability + clustering + SSE + novelty for LaProteina/DPLM2 - convert_afdb_cluster_reps_to_pdb.py: AFDB SwissProt cluster reps -> PDB - analyze_tm_score_novelty.py: foldseek easy-search of cluster reps vs PDB / AFDB / DeNovo reference sets - compile_benchmark_table.py: stitch GenUME + competitor results - analyze_selfreflection_paired.py / build_selective_sr_table.py: per-sample paired SR analysis + length-selective SR policy - analyze_sr_threshold_sweep.py / plot_sr_qc_threshold_sweep.py / plot_sr_qc_tm_sweep_balanced.py: SR forward-fold-TM gate sweep (T=0/0.833/0.9), pooled + length-balanced - plot_per_length_designability_bars.py: per-length designability bars with Fisher's exact significance - plot_forward_vs_esmfold_tm.py / plot_forward_vs_esmfold_tm_cameo.py: lobster forward-fold TM vs ESMFold TM scatter (uncond + CAMEO) - esmfold_failed_attempts.py / build_sr_esmfold_concordance.py / plot_sr_vs_esmfold_unconditional.py: post-hoc ESMFold of SR-rejected attempts + 2x2 concordance matrix vs ESMFold gate SLURM: - eval_gen_ume_denovo_sr_tm0p9.sh: SR runs at the tighter 0.9 gate - eval_competitor_unconditional.sh: LaProteina/DPLM2 ESMFold + clustering Source changes: - generate.py: add _save_failed_self_reflection_attempt() so SR runs with generation.self_reflection.save_failed_attempts=true persist the initial seq+backbone of every forward-fold-TM-rejected attempt for the ESMFold-concordance follow-up - _generation_utils.py: fix MetricsCSVWriter column-shift bug (drop two *_kabsch keys that had no header columns; add ESMFold-agreement comparison columns); add compute_complex_metrics_vs_gt() shared helper Co-authored-by: Cursor <[email protected]>
… eval callback Mirrors the protein-only PLL study on the 4-modality protein-ligand checkpoint (sequence, protein-structure, ligand-atom, ligand-structure). PLL scoring + best-of-N: - score_gen_ume_protein_ligand_pll.py: 4-modality AR-MC PLL scorer with per-modality (seq / struc / lig_atom / lig_struc) scores, additive joints (joint_protein, joint_ligand, joint_all), and a true 4-way joint (joint_true_4) computed via one extra forward per K with all four modalities masked simultaneously - forward_fold_bestofN_pll_ligand.py / inverse_fold_bestofN_pll_ligand.py / conditioned_gen_bestofN_pll_ligand.py: per-task best-of-N drivers on PoseBusters with the new pickers (per-modality + joint variants + Boltz2 iptm / TM oracles) - benchmark_conditioned_gen.py: refactored CG benchmark with picker comparison CG Boltz callback: - _cg_boltz_eval.py + cg_boltz_eval.yaml: lightweight Boltz2 cofold evaluator wired into the protein-ligand training callbacks suite Source updates: - evaluate_ligand_conditioned_protein_generation.py: tune default temperature/stochasticity per-modality (seq/struc/ligand) for CG benchmark hyperparameters - ligand_conditioned_protein_generation.py: prefer ligand_data["smiles"] over re-reading from SDF when available - submit_cofold_batch.py: --max_concurrent for SLURM array throttle and skip-if-result-already-complete guard - run_full_eval.sh: CG_NUM_LIGANDS / CG_NUM_DESIGNS / CG_DATA_DIR knobs; default CG to Proteina-style 4 ligands x 10 designs at nsteps=200 Co-authored-by: Cursor <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Brief description of changes made
Type of Change