Add online EWA and shrinkage covariance/precision estimators#1923
Open
MaxHalford wants to merge 4 commits into
Open
Add online EWA and shrinkage covariance/precision estimators#1923MaxHalford wants to merge 4 commits into
MaxHalford wants to merge 4 commits into
Conversation
…stimators Add a family of online covariance estimators reimplemented from the `precise` package, following River conventions (dict-native, Narwhals mini-batches) and excluding any lazy invert-on-read methods: - covariance.EwaCovariance: exponentially weighted covariance (RiskMetrics style); diagonal matches stats.EWVar, off-diagonals match stats.EWCov. - covariance.LedoitWolfCovariance / OASCovariance: data-driven shrinkage towards a scaled identity for high-dimensional / few-sample regimes. - covariance.ShrunkCovariance: fixed-intensity shrinkage with a constant-correlation (finance) or identity target. - covariance.EwaPrecision: exponentially weighted precision via a forgetting-factor Sherman-Morrison update; genuinely online, never inverts explicitly. Recency-weighted counterpart of EmpiricalPrecision. - stats.EWCov: exponentially weighted covariance primitive (bivariate counterpart of stats.EWVar). - datasets.SP500Stocks: daily returns of ten large-cap S&P 500 stocks (2013-2018), used in the docstring examples. Internals are array-backed with a feature->index map (like EmpiricalPrecision) behind a dict-native interface. Also guards SymmetricMatrix.__repr__ against empty (unfitted) matrices. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Member
Author
|
@microprediction this PR pulls some of precise's methods into River. I'm very grateful for this gift! |
Replace the circular EWCov test (which re-implemented the estimator's own E[xy]-E[x]E[y] recursion) with a comparison against pandas' ewm().cov(), and add a test comparing EmpiricalCovariance against sklearn's batch estimator. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop pytest.importorskip in favour of a plain inline sklearn import (matching the existing sklearn test), extract the _dense value helper to module level, and trim the EWCov comment. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…many narwhals-native Migrate the empirical estimators' `update_many` off the hard-coded pandas path (`.values`/`.columns`) to the `utils.dataframe` narwhals boundary helpers, matching the new EWA/shrinkage estimators and the rest of the #1919 migration. Any narwhals-supported eager dataframe (pandas, polars, pyarrow, ...) now flows through; the pandas path is byte-for-byte unchanged. Adds multi-backend tests via the `frame_backend` fixture. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
17 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a family of online covariance estimators to
river.covariance, reimplemented from theprecisepackage following River conventions (dict-native, Narwhals mini-batches), per discussion #1884. Only genuinely-online methods are included — nothing that lazily inverts a stored covariance on read.New estimators (
river/covariance/ewa.py)EwaCovariance— exponentially weighted (RiskMetrics-style) covariance. Diagonal matchesstats.EWVar, off-diagonals matchstats.EWCov. For non-stationary streams whose relationships drift over time.LedoitWolfCovariance/OASCovariance— data-driven shrinkage of the EWA covariance towards a scaled identity (Ledoit-Wolf 2004 / Chen et al. 2010 intensities). For high-dimensional / few-sample regimes where the raw covariance is noisy or singular.ShrunkCovariance— fixed-intensity shrinkage with a finance-friendly constant-correlation target (or identity).EwaPrecision— exponentially weighted precision (inverse covariance) maintained online via a forgetting-factor Sherman-Morrison update. Genuinely online (O(d²)/step, never inverts explicitly); the recency-weighted counterpart ofEmpiricalPrecision.Supporting additions
stats.EWCov— exponentially weighted covariance primitive (bivariate counterpart ofstats.EWVar), composed fromEWMeans so the convention matches exactly.datasets.SP500Stocks— daily returns (in %) of ten large-cap S&P 500 stocks across diverse sectors (2013–2018, 1,257 trading days), used in the docstring examples. Bundledsp500.csv.gz.SymmetricMatrix.__repr__against empty (unfitted) matrices (previously raisedValueError).Design notes
EmpiricalPrecision) behind a dict-native public interface (update,update_many,matrix,__getitem__).stats.EWMean/EWVarso the diagonal and off-diagonals are exactly the existing scalar EW statistics.Tests
river/covariance/test_ewa.py: EWA vs independent numpy reference +stats.EWVar/EWCov;update_many≡ single-updateloop across all estimators and across pandas/polars backends; shrinkage vssklearn.covarianceoracle;EwaPrecisionvs numpy oracle andP @ S ≈ I; PSD/symmetry invariants; pickling; empty-repr.SP500Stocksdata.uv run mypyandruffare clean.🤖 Generated with Claude Code