Add decomposition methods OnlineSVD, OnlinePCA, OnlineDMD/wC + Hankelizer by MarekWadinger · Pull Request #1509 · online-ml/river

MarekWadinger · 2024-03-06T08:29:44Z

Hello @MaxHalford, @hoanganhngo610, and everyone 👋,

In #1366, @MaxHalford showed interest in implementation of OnlinePCA and OnlineSVD methods in river.

Given my current project involvement with online decomposition methods, I believe the community could benefit from having access to these methods and their maintenance over time. Additionally, I am particularly interested in DMD, which combines the advantages of PCA and FFT. Hence, I propose the introduction of three new methods as part of the new decomposition module:

decomposition.OnlineSVD implemented based on Brand, M. (2006) (proposed by @MaxHalford in issue) with some considerations on re-orthogonalization. Since it is required quite often, compromising computation speed, it could be interesting to align with Zhang, Y. (2022) (I made some effort to implement but I'm yet to expore validity and possibility to implement revert in similar vein).

decomposition.OnlinePCA implemented based on Eftekhari, A. (2019) (proposed by @MaxHalford in issue), as it is currently state-of-the-art with all the proofs and guarantees. Would be happy to validate together if all considerations are handled in proposed OnlinePCA.

decomposition.OnlineDMD implemented based on Zhang, H. 2019. It can operate as MiniBatchTransformer, MiniBatchRegressor (sort of), and works with Rolling so I would need some help figuring out how we'd like to classify it (maybe new base class Decomposer.

Additionally, I propose preprocessing.Hankelizer, which could be beneficial for various regressors and particularly useful for enhancing feature space by introducing time-delayed embedding.

I've tried to include all necessary tests. However, I need to investigate why re-orthogonalization in OnlineSVD yields significantly different values when tested on various operating systems (locally, all tests pass).

Looking forward for your comments and revisions. 😌

…eig + FIX: exponential w in learn many + MINOR: robustness

… ADD: score attribute

Standardization of input shapes

review-notebook-app · 2024-06-04T08:45:13Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

MarekWadinger · 2024-06-04T09:11:41Z

Hello @MaxHalford and @hoanganhngo610, 👋

I believe the methods are ready for benchmarking. The results are published in this notebook.

In the plot I combine two checks, performance w.r.t. number of features and delay imposed by conversion from pd.DataFrame (dict) to np.array used in the core.

Mean absolute number of processed samples per second is provided here (for n features in range(3,20) as it remains pretty stable):

np.array
3102 OnlineDMD
19553 OnlinePCA
631 OnlineSVD (Probably will be completely replaced by Zhang implementation bellow)
3503 OnlineSVDZhang
pd.DataFrame
1267 OnlineDMD
18012 OnlinePCA
683 OnlineSVD
1718 OnlineSVDZhang

The results in the notebook indicate that using pd.DataFrame slows down OnlinePCA, which is the fastest decomposition implementation, by up to 14%. However, I believe your concerns are likely related to the fact that the core of the decomposition methods works with np.arrays, correct?

What are your thoughts on the performance and adequacy of the evaluation?

Thanks for your time 🙏

s-bessing · 2025-07-23T14:52:31Z

Is this still active?

MarekWadinger · 2025-07-23T15:23:42Z

Hey @s-bessing. I would love to have this reviewed and published. I'm actively working with OnlineDMD.

@hoanganhngo610 @MaxHalford are you available to refresh discussion on this?

I'm ready to fix the checks if you could provide some feedback also on my latest comment. :)

Thx

s-bessing · 2025-07-24T11:20:23Z

@MarekWadinger, thanks for the reply. I am currently working on an online topic model. For this, I came across the river package and like the approach. Currently, I use a static reducer (UMAP), but I am not entirely satisfied with it since it is static.
As input, I use a growing list of small documents that I convert into embeddings. Would OnlineDMD be a suitable solution for this use case?

MarekWadinger · 2025-07-29T05:34:10Z

Hey @s-bessing.

I used to really like UMAP in my fault detection projects. I believe OnlineDMD could be a match but there are some bottlenecks. It works much better on reasonably noisy data as high autocorrelation may break the underlying SVD computation in case of piece-wise constant behavior (this happens to me, for instance, in OnlineDMDc where I have information about control signal noiseless and does not change for a while). But if there are certain periodic components and dominant patterns in your data, I think you should give it a hit. :)

kulbachcedric · 2025-12-29T09:51:39Z

Hi @MarekWadinger and @s-bessing,
your PR is quite impressive!
However, we’re currently doing a cleanup of older Pull Requests in the river repository and wanted to check in with you.
If you plan to continue, feel free to push updates or let us know if you need any help or feedback.

Thanks a lot for your contribution!

MarekWadinger · 2026-01-14T06:30:05Z

Hey @kulbachcedric,

I would love to bring this back to life. It got stuck on review process. I can wipe some dust and would love to get a feedback on the PR once I'm done. :)

…iness - Fix failing doctests: wrap np.isclose/np.allclose in bool() for NumPy 2.x compatibility, skip non-deterministic OnlinePCA outputs - Replace deprecated np.row_stack with np.vstack throughout - Replace assert statements with ValueError for parameter validation - Add _unit_test_skips to OnlineDMD so check_estimator passes - Remove debug counters (_n_cached, _n_computed) marked for removal - Remove warning spam in OnlineSVDZhang.update (60k+ warnings in tests) - Add type hint for OnlineSVD.solver parameter - Clean up all TODO comments across the module - Add release notes for decomposition module and Hankelizer

…argetRegressor for correct type hints

… with numerical precision notes

MarekWadinger · 2026-03-10T17:47:16Z

Funny, the failing test converges to different optima on my machine and on the ubuntu where the code-quality checks are running. I proceeded with simplified test to check whether the eigenvalues are finite

MarekWadinger · 2026-03-10T17:51:13Z

@kulbachcedric ready for rereview ;)

# Conflicts: # docs/releases/unreleased.md # river/compose/pipeline.py # river/utils/rolling.py

- preprocessing.Hankelizer: convert docstring from Google style (Args:/Examples:/Todo:) to the NumPy convention mandated by CONTRIBUTING.md. Unblocks river/test_docs.py::test_print_docstring, which now parses every public docstring. - test_odmdwc.py: access .A on the wrapped OnlineDMDwC instance through Rolling.obj instead of relying on __getattr__ delegation, which mypy cannot resolve statically. Behavior unchanged.

- Make pandas optional in odmd.py: the top-level `import pandas as pd` was failing the new "Tests without pandas" CI job introduced upstream. Pandas is now imported under `TYPE_CHECKING`, and runtime `isinstance` checks go through a `_is_dataframe` TypeGuard backed by `utils.pandas.PANDAS_INSTALLED` / `utils.pandas.import_pandas()`. - Rename three N802-violating methods exposed by upstream's pep8-naming ruleset: `A_allclose` -> `a_allclose`, `_update_A_P` -> `_update_a_p`, `_reconstruct_AB` -> `_reconstruct_ab`. All callsites are within odmd.py; no external API impact.

The previous fix only covered odmd.py, but odmd.py imports osvd.py at line 30, so the eager `import pandas as pd` in osvd.py was still crashing CI's "Tests without pandas" job during decomposition module collection. Same treatment as odmd.py: pandas moved under TYPE_CHECKING, an `_is_dataframe` TypeGuard handles the runtime isinstance check, and the `pd.DataFrame(...)` constructions in `transform_many` go through `utils.pandas.import_pandas()` so calling that mini-batch method without pandas surfaces the project's standard "install river[pandas]" error. Added `>>> import numpy as np` / `>>> import pandas as pd` to the OnlineSVD and OnlineSVDZhang docstring examples — they relied on the module-level imports that are now lazy. Verified end-to-end against the CI recipe in a fresh venv: `uv sync --all-extras --group dev` then `uv pip uninstall pandas` then `pytest` -> 4391 passed, 0 failed. Full suite with pandas: 4654 passed.

`sp.sparse.linalg.svds` uses ARPACK, whose initial random vector is not controlled by `np.random.seed`, so consecutive calls return singular vectors with arbitrary (and uncorrelated) signs. The previous assertion compared raw vector differences, which failed nondeterministically whenever any column happened to land on opposite signs across the three SVD calls. Align column signs to `u_orig` before computing distances. Verified deterministic across 8 isolated runs and 4 full-suite runs.

MaxHalford · 2026-06-23T09:34:57Z

Hello @MarekWadinger. I'm open to merging your contributions, but not like this. What I'd like to do is open one PR for each contribution, and take a thorough look at each method. These methods could be useful, so we need to benchmark them, assess their relevance, and provide usage documentation. Merging this whole PR as is would not provide benefits for users, I believe.

MarekWadinger and others added 30 commits February 13, 2024 17:16

Initial commit

42e77e0

ADD: Class DMM, OnlineDMD, with Control, Weighting and Windowing

ec86781

UPDATE: make r optional + REFACTOR: DMDC -> DMDwC

c922746

UPDATE: align inputs with river.MiniBatchRegressor

ddac257

UPDATE: align notation of DMDMD and ODMD

37fa925

FIX: missing _Y buffer for xi comp + REMOVE: cvxpy dependency of xi comp

c73362f

ADD: initial implementation of SubIDDriftDetector

68d1e44

UPDATE: remove cvxpy dep of DMD

95d87bf

ADD: input Y for compatibility + FIX: known B handling

a522752

ADD: r to control truncation of eig + REFACTOR: rename eigs_modes -> …

4d25fe4

…eig + FIX: exponential w in learn many + MINOR: robustness

REFACTOR: train_size -> ref_size; _drift_detected -> drift_detected +…

057e006

… ADD: score attribute

ADD: hankel function

17219d6

FORMAT: ruff

ccfd725

ADD: automations and dev tools

ae2f0b6

FIX: Py3.9 compatibility

9d7d460

UPDATE: actions versions

ad931c2

UPDATE: actions versions

bd1f372

ADD: OnlineDMD tests

2ab0d72

UPDATE: badge handling

4fcaaff

REMOVE: redundant arguments in action

db21046

ADD: tests + FIX: _update_many; _init_update

5a2ca4f

FORMAT: ruff

86f4ad4

FIX: numerical precison issue in tesst

1f15527

ADD: tranform_one and transform_many options

d669ec3

FIX: inputs compatibility issues

68676b4

UPDATE: standardize inputs shape (m, n) -> (n, m)

428a337

UPDATE: try to speed up eig computation

b7ed6e3

UPDATE: standardize inputs shape (m, n) -> (n, m) + speed up

54e6833

ADD: TODO item

4784d80

Merge pull request #1 from MarekWadinger/dev

02f28bd

Standardization of input shapes

MarekWadinger added 2 commits June 4, 2024 17:40

UPDATE: drop warnings in initialization

9674783

ADD: benchmark decomposition methods np vs pd inputs

f5352c6

MarekWadinger added 2 commits August 9, 2024 11:25

Merge branch 'main' of github.com:online-ml/river into online-ml-main

018a614

Merge branch 'online-ml-main'

52e6ad4

gbolmier mentioned this pull request Oct 1, 2024

Incremental PCA implementation #3

Closed

Merge branch 'online-ml:main' into main

5a5014e

Merge branch 'main' into main

cf06ad3

MarekWadinger added 5 commits March 10, 2026 10:47

Merge branch 'main' of github.com:online-ml/river

bd5caf7

refactor(decomposition): align with River API and add MiniBatchMultiT…

6cff98a

…argetRegressor for correct type hints

test(benchmarks): rerun benchmarks

1c8d385

docs(decomposition): enhance docstrings for OnlineDMD and OnlineDMDwC…

cedf807

… with numerical precision notes

kulbachcedric linked an issue Mar 11, 2026 that may be closed by this pull request

PCA (SVD) #1366

Open

MaxHalford added the New feature label Apr 10, 2026

MarekWadinger added 5 commits June 11, 2026 11:09

Merge remote-tracking branch 'upstream/main'

ea2339c

# Conflicts: # docs/releases/unreleased.md # river/compose/pipeline.py # river/utils/rolling.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add decomposition methods OnlineSVD, OnlinePCA, OnlineDMD/wC + Hankelizer#1509

Add decomposition methods OnlineSVD, OnlinePCA, OnlineDMD/wC + Hankelizer#1509
MarekWadinger wants to merge 101 commits into
online-ml:mainfrom
MarekWadinger:main

MarekWadinger commented Mar 6, 2024 •

edited

Loading

Uh oh!

review-notebook-app Bot commented Jun 4, 2024

Uh oh!

MarekWadinger commented Jun 4, 2024

Uh oh!

s-bessing commented Jul 23, 2025 •

edited

Loading

Uh oh!

MarekWadinger commented Jul 23, 2025 •

edited

Loading

Uh oh!

s-bessing commented Jul 24, 2025

Uh oh!

MarekWadinger commented Jul 29, 2025

Uh oh!

kulbachcedric commented Dec 29, 2025

Uh oh!

MarekWadinger commented Jan 14, 2026

Uh oh!

MarekWadinger commented Mar 10, 2026 •

edited

Loading

Uh oh!

MarekWadinger commented Mar 10, 2026

Uh oh!

MaxHalford commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Uh oh!

Conversation

MarekWadinger commented Mar 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app Bot commented Jun 4, 2024

Uh oh!

MarekWadinger commented Jun 4, 2024

Uh oh!

s-bessing commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarekWadinger commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

s-bessing commented Jul 24, 2025

Uh oh!

MarekWadinger commented Jul 29, 2025

Uh oh!

kulbachcedric commented Dec 29, 2025

Uh oh!

MarekWadinger commented Jan 14, 2026

Uh oh!

MarekWadinger commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarekWadinger commented Mar 10, 2026

Uh oh!

MaxHalford commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

MarekWadinger commented Mar 6, 2024 •

edited

Loading

s-bessing commented Jul 23, 2025 •

edited

Loading

MarekWadinger commented Jul 23, 2025 •

edited

Loading

MarekWadinger commented Mar 10, 2026 •

edited

Loading