Skip to content

Add online EWA and shrinkage covariance/precision estimators#1923

Open
MaxHalford wants to merge 4 commits into
mainfrom
feat/online-covariance-estimators
Open

Add online EWA and shrinkage covariance/precision estimators#1923
MaxHalford wants to merge 4 commits into
mainfrom
feat/online-covariance-estimators

Conversation

@MaxHalford

Copy link
Copy Markdown
Member

What

Adds a family of online covariance estimators to river.covariance, reimplemented from the precise package following River conventions (dict-native, Narwhals mini-batches), per discussion #1884. Only genuinely-online methods are included — nothing that lazily inverts a stored covariance on read.

New estimators (river/covariance/ewa.py)

  • EwaCovariance — exponentially weighted (RiskMetrics-style) covariance. Diagonal matches stats.EWVar, off-diagonals match stats.EWCov. For non-stationary streams whose relationships drift over time.
  • LedoitWolfCovariance / OASCovariance — data-driven shrinkage of the EWA covariance towards a scaled identity (Ledoit-Wolf 2004 / Chen et al. 2010 intensities). For high-dimensional / few-sample regimes where the raw covariance is noisy or singular.
  • ShrunkCovariance — fixed-intensity shrinkage with a finance-friendly constant-correlation target (or identity).
  • EwaPrecision — exponentially weighted precision (inverse covariance) maintained online via a forgetting-factor Sherman-Morrison update. Genuinely online (O(d²)/step, never inverts explicitly); the recency-weighted counterpart of EmpiricalPrecision.

Supporting additions

  • stats.EWCov — exponentially weighted covariance primitive (bivariate counterpart of stats.EWVar), composed from EWMeans so the convention matches exactly.
  • datasets.SP500Stocks — daily returns (in %) of ten large-cap S&P 500 stocks across diverse sectors (2013–2018, 1,257 trading days), used in the docstring examples. Bundled sp500.csv.gz.
  • Guards SymmetricMatrix.__repr__ against empty (unfitted) matrices (previously raised ValueError).

Design notes

  • Internals are array-backed with a feature→index map (the same pattern as the existing EmpiricalPrecision) behind a dict-native public interface (update, update_many, matrix, __getitem__).
  • The EWA convention reuses stats.EWMean/EWVar so the diagonal and off-diagonals are exactly the existing scalar EW statistics.
  • No precision equivalents for the shrinkage estimators: shrinking toward a scaled identity is a full-rank perturbation that can't be tracked by rank-one inverse updates, so a "shrunk precision" would require inverting on read — exactly the lazy approach this work excludes.

Tests

  • river/covariance/test_ewa.py: EWA vs independent numpy reference + stats.EWVar/EWCov; update_many ≡ single-update loop across all estimators and across pandas/polars backends; shrinkage vs sklearn.covariance oracle; EwaPrecision vs numpy oracle and P @ S ≈ I; PSD/symmetry invariants; pickling; empty-repr.
  • All doctests use the real SP500Stocks data. uv run mypy and ruff are clean.

🤖 Generated with Claude Code

…stimators

Add a family of online covariance estimators reimplemented from the
`precise` package, following River conventions (dict-native, Narwhals
mini-batches) and excluding any lazy invert-on-read methods:

- covariance.EwaCovariance: exponentially weighted covariance (RiskMetrics
  style); diagonal matches stats.EWVar, off-diagonals match stats.EWCov.
- covariance.LedoitWolfCovariance / OASCovariance: data-driven shrinkage
  towards a scaled identity for high-dimensional / few-sample regimes.
- covariance.ShrunkCovariance: fixed-intensity shrinkage with a
  constant-correlation (finance) or identity target.
- covariance.EwaPrecision: exponentially weighted precision via a
  forgetting-factor Sherman-Morrison update; genuinely online, never
  inverts explicitly. Recency-weighted counterpart of EmpiricalPrecision.
- stats.EWCov: exponentially weighted covariance primitive (bivariate
  counterpart of stats.EWVar).
- datasets.SP500Stocks: daily returns of ten large-cap S&P 500 stocks
  (2013-2018), used in the docstring examples.

Internals are array-backed with a feature->index map (like
EmpiricalPrecision) behind a dict-native interface. Also guards
SymmetricMatrix.__repr__ against empty (unfitted) matrices.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@MaxHalford

Copy link
Copy Markdown
Member Author

@microprediction this PR pulls some of precise's methods into River. I'm very grateful for this gift!

MaxHalford and others added 3 commits June 25, 2026 20:14
Replace the circular EWCov test (which re-implemented the estimator's own
E[xy]-E[x]E[y] recursion) with a comparison against pandas' ewm().cov(), and
add a test comparing EmpiricalCovariance against sklearn's batch estimator.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop pytest.importorskip in favour of a plain inline sklearn import (matching
the existing sklearn test), extract the _dense value helper to module level,
and trim the EWCov comment.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…many narwhals-native

Migrate the empirical estimators' `update_many` off the hard-coded pandas
path (`.values`/`.columns`) to the `utils.dataframe` narwhals boundary helpers,
matching the new EWA/shrinkage estimators and the rest of the #1919 migration.
Any narwhals-supported eager dataframe (pandas, polars, pyarrow, ...) now flows
through; the pandas path is byte-for-byte unchanged. Adds multi-backend tests
via the `frame_backend` fixture.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant