Add metrics client by ehanson8 · Pull Request #226 · MITLibraries/dspace-submission-composer

ehanson8 · 2026-04-06T19:30:18Z

Purpose and background context

This PR creates a MetricsClient class that will eventually be added to our template repos and provides an opportunity to refine our conventions for publishing AWS metrics.

How can a reviewer manually see the effects of these changes?

This is not easily to manually test but you can review this Cloudwatch query to see the metrics that have been produced by this code over the past week

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

NA

Code review

Code review best practices are documented here and you are encouraged to have a constructive dialogue with your reviewers about their preferences and expectations.

coveralls · 2026-04-06T19:32:34Z

Coverage Report for CI Build 24909351443

Warning

No base build found for commit 3cc6e39 on main.
Coverage changes can't be calculated without a base build.
If a base build is processing, this comment will update automatically when it completes.

Coverage: 94.77%

Details

Patch coverage: 20 uncovered changes across 1 file (62 of 82 lines covered, 75.61%).

Uncovered Changes

File	Changed	Covered	%
dsc/utils/aws/metrics.py	70	50	71.43%

Coverage Regressions

Requires a base build to compare against. How to fix this →

Coverage Stats


Relevant Lines:	1128
Covered Lines:	1069
Line Coverage:	94.77%
Coverage Strength:	0.95 hits per line

💛 - Coveralls

ehanson8 · 2026-04-06T19:33:58Z

+            logger.info("No metrics to publish.")
+            return
+
+        # Validate all metrics before publishing


While this feels slightly weird to re-validate as they were validated in add_metric_to_batch, there is the potential for something to go haywire in between when they are added and when they are published. A cleaner solution would be much appreciated!

My first comment review + inline comments touches on this!

ehanson8 · 2026-04-06T19:36:02Z

+
+        for x in range(0, len(self.batch_metrics), batch_size):
+            self._push_metric_data(self.batch_metrics[x : x + batch_size])
+        self.batch_metrics.clear()


Forgoing unit test for now, but unit tests will certainly be included when the MetricsClient is added to the template repos.

ghukill

Before I jump too far into a review, as I was looking at the overall structure, I had a thought/question that might have some rippling effects.

What if we had a dataclass like Metric?

@dataclass
class Metric:
    metric_name: str
    value: int | float | str # maybe others?
    unit: str
    metric_dimensions: dict[str, str] | None = None

If this was defined in the same module, that dataclass could handle validation during initialization, thereby removing the need to validate anywhere else in the class. I'm not crazy about the idea, but I think you can use @dataclass(frozen=True) and a dataclass instance can't be modified. If so, then you avoid that double validation approach to guard against a once-valid-but-maybe-not-anymore situation.

UPDATE: leaving the above thinking about validation, but see also this comment. It feels correct to init the client with allowed metrics, therefore it doesn't make sense for this proposed Metric dataclass to validate itself.

It would also begin to chip away at this repeating block for most methods (not to mention the docstrings):

name: str,
value: int,
unit: str,
metric_dimensions: dict[str, str] | None = None,

If we had something like that, I could imagine method signatures along the lines of:

def publish_metric(self, metric:Metric): ...

def add_metric_to_batch(self, metric:Metric): ...

def publish_metrics_batch(self): ...

To actually use inline in code, could just init as part of the publish:

client.publish_metric(Metric(name="foo", value=42, unit="seconds"))

Whattya think? Even if you don't go quite this far, and only want to pass around a new Metric dataclass internally between methods, I do still think it could be a good place for validation and defining those metric properties only once.

ghukill · 2026-04-07T13:35:53Z

+
+    def __init__(self) -> None:
+        """Initialize the MetricsClient."""
+        self.cloudwatch = boto3.client("cloudwatch")


Feeling like maybe we should make this a private property, like self._cloudwatch? The first thing I do with a new class/client like this is fire it up in a console and see what's available, what's surprising, etc., and this was a little surprising!

ghukill · 2026-04-07T13:54:53Z

+        Args:
+            metric_name: The name of the metric to check.
+        """
+        if metric_name not in METRICS:


So first of all, I really like the ability to validate the metric name itself. But the ergonomics feel just a little off to me, given that it's hardcoded to expect a METRICS variable in scope somehow.

I think this tightly couples the client to the nature of the project's config.

What if the class instantiation would support an allowed_metrics: list[str] | None = None?

In code, if None, assume the user doesn't care about metric name validation. But if set, validate against that list.

In code, I think it'd be as trivial as instantiating like this:

from dsc.config import ALLOWED_METRICS metrics_client = MetricsClient(allowed_metrics=ALLOWED_METRICS)

I may find a place for it elsewhere, but I think the same would apply to NAMESPACE which has the same kind of tight coupling.

To be fair: this is a test implementation that will get ported to the templates... but insofar as that implementation will need to be a bit more application structure agnostic, so too must this.

Lastly, see my actual PR comment review. I'm proposing the use of a Metric dataclass, but it can't really validate the metric name if we go this route where it's not a magic value in scope, but a value optionally passed during client init.

As such, this comment thread may suggest that Metric self-validation is not a great approach, and it should remain here in the client. I will ammend that comment before sending.

* Rename Config.METRICS > ALLOWED_METRICS and change from list to set * Add Metric dataclass and update MetricsClient class to use it * Update calls in Workflow class to use Metric dataclass * Update dependencies

ehanson8 · 2026-04-10T15:32:19Z

Pushed a new commit based on @ghukill's comments and here's a new query showing the results of the updated code

ghukill

Starting with a disclaimer that, as requested, going very hard into nit picking on this PR with the hope / expectation that we could drop this metrics.py basically as-is into the template repositories and use elsewhere.

My inlined comments about method names and removing double validation are somewhat subjective and optional.

But there is a bigger piece around metrics namespace that I was having trouble finding a single place to inline comment.

I think that we should avoid namespace getting set by a METRICS_NAMESPACE imported into the metrics module. The tightly couples the config for that app to expose a METRICS_NAMESPACE constant in the config file, not to mention just requiring a config file (yes most have, but maybe not all do or will).

Additionally, we might find that a single application wants to publish metrics to multiple namespaces.

For these reasons, I might propose an alternate pattern:

Optionally init MetricsClient with a namespace that will get set for metrics during publishing. Consider supporting an env var default like AWS_METRICS_NAMESPACE.
Optionally set namespace in the new Metric dataclass that will get used for that metric
A namespace in a single Metric instance will override a default namespace set for the client.
During metric validation, an error is thrown if neither set a namespace for the client.

This would give applications a lot of flexibility of how they construct metrics an the metrics client itself.

For example, in this DSC app, any given file or module could do:

from dsc.config import Config
from dsc.utils.aws.metrics import MetricsClient

CONFIG = Config()

metrics_client = MetricsClient(
    namespace=CONFIG.metrics_namespace,
    allowed_metrics=CONFIG.allowed_metrics
)

Note that this assume the Config class has properties for these values instead of free floating constants in the config.py module. But of course, either way works.

Or, maybe the config.py has a metrics client factcory helper if lots of files need a metrics client:

from dsc.utils.aws.metrics import MetricsClient

def get_metrics_client() -> MetricsClient:
    return MetricsClient(
        namespace=METRICS_NAMESPACE,
        allowed_metrics=ALLOWED_METRICS
    )

Note that in this example, the module level constants are used as currently they are set. The flexibility is kind of key to me, in whatever state / conventions the app uses.

Then could use anywhere in the app:

from dsc.config import get_metrics_client

metrics_client = get_metrics_client()

Lastly, an app may opt to publish to multiple namespaces. For that app, a default namespace wouldn't be set on the client itself. They would always prepare metrics with the namespace baked in:

m1 = Metric(
    namespace="timdex",
    # rest of fields...
)

m2 = Metric(
    namespace="use",
    # rest of fields...
)

metrics_client.add_metrics_to_batch([m1, m2])

ehanson8 · 2026-04-13T20:32:47Z

For these reasons, I might propose an alternate pattern:

Optionally init MetricsClient with a namespace that will get set for metrics during publishing. Consider supporting an env var default like AWS_METRICS_NAMESPACE.

Optionally set namespace in the new Metric dataclass that will get used for that metric

A namespace in a single Metric instance will override a default namespace set for the client.

During metric validation, an error is thrown if neither set a namespace for the client.

A great recommendation, working on this

* Add namespace param to Metric class and MetricsClient.__init__ * Rename publish_single_metric > publish_metric, _push_metric_data > _publish_metrics, add_metric_to_batch > add_metrics_to_batch, and publish_batch_metrics > publish_metrics_batch * Add logic to use Metric.namespace over MetricsClient.namespace * Remove validation from publish_metrics_batch * Update calls in Workflow class

ehanson8 · 2026-04-15T15:05:42Z

@ghukill Pushed a new commit! And the most recent Cloudwatch query

ghukill

I've requested a change, but approving as well. If you feel the change is not correct, please feel free to re-request review from me.

That little change aside, looking great! Thanks for the all the discussion on this. Feels like something we could drop right into the templates now.

* Rename Config.METRICS > ALLOWED_METRICS and change from list to set * Add Metric dataclass and update MetricsClient class to use it * Update calls in Workflow class to use Metric dataclass * Update dependencies

* Add namespace param to Metric class and MetricsClient.__init__ * Rename publish_single_metric > publish_metric, _push_metric_data > _publish_metrics, add_metric_to_batch > add_metrics_to_batch, and publish_batch_metrics > publish_metrics_batch * Add logic to use Metric.namespace over MetricsClient.namespace * Remove validation from publish_metrics_batch * Update calls in Workflow class

Why these changes are being introduced: * A metrics client class is needed to implement AWS Cloudwatch metrics How this addresses that need: * Add MetricsClient class with methods for publishing individual and batch metrics * Add METRICS_NAMESPACE and METRICS constants to config.py * Add metrics attributes to Workflow class * Add publish_metric method calls to submit_items and finalize_items methods * Update dependencies Side effects of this change: * None Relevant ticket(s): * NA

* Rename Config.METRICS > ALLOWED_METRICS and change from list to set * Add Metric dataclass and update MetricsClient class to use it * Update calls in Workflow class to use Metric dataclass * Update dependencies

* Add namespace param to Metric class and MetricsClient.__init__ * Rename publish_single_metric > publish_metric, _push_metric_data > _publish_metrics, add_metric_to_batch > add_metrics_to_batch, and publish_batch_metrics > publish_metrics_batch * Add logic to use Metric.namespace over MetricsClient.namespace * Remove validation from publish_metrics_batch * Update calls in Workflow class

ehanson8 requested review from ghukill and jonavellecuerdo April 6, 2026 19:30

ehanson8 requested a review from a team as a code owner April 6, 2026 19:30

ehanson8 commented Apr 6, 2026

View reviewed changes

ghukill reviewed Apr 7, 2026

View reviewed changes

ghukill reviewed Apr 10, 2026

View reviewed changes

Comment thread dsc/utils/aws/metrics.py Outdated

Comment thread dsc/utils/aws/metrics.py Outdated

Comment thread dsc/utils/aws/metrics.py Outdated

Comment thread dsc/utils/aws/metrics.py Outdated

Comment thread dsc/utils/aws/metrics.py Outdated

Comment thread dsc/utils/aws/metrics.py Outdated

ghukill approved these changes Apr 15, 2026

View reviewed changes

Comment thread dsc/utils/aws/metrics.py Outdated

ehanson8 force-pushed the metrics branch from 4d991d9 to aa82e69 Compare April 15, 2026 17:51

ehanson8 added 4 commits April 24, 2026 15:58

Updates based on discussion in PR #226

a27d72c

* Rename Config.METRICS > ALLOWED_METRICS and change from list to set * Add Metric dataclass and update MetricsClient class to use it * Update calls in Workflow class to use Metric dataclass * Update dependencies

Update MetricsClient's namespace to keyword arg

0c0452e

ehanson8 force-pushed the metrics branch from aa82e69 to 0c0452e Compare April 24, 2026 20:03

Conversation

ehanson8 commented Apr 6, 2026

Purpose and background context

How can a reviewer manually see the effects of these changes?

Includes new or updated dependencies?

Changes expectations for external applications?

What are the relevant tickets?

Code review

Uh oh!

coveralls commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report for CI Build 24909351443

Coverage: 94.77%

Details

Uncovered Changes

Coverage Regressions

Coverage Stats

💛 - Coveralls

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghukill left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ehanson8 commented Apr 10, 2026

Uh oh!

ghukill left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ehanson8 commented Apr 13, 2026

Uh oh!

ehanson8 commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghukill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coveralls commented Apr 6, 2026 •

edited

Loading

ghukill left a comment •

edited

Loading

ehanson8 commented Apr 15, 2026 •

edited

Loading