Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
---
title: Choosing a Container Size
nav: Choosing a Container Size
modified_at: 2026-05-18 00:00:00
tags: app scaling containers memory metrics
index: 1
---

Choosing the right container size is a balance between safety, performance, and
cost. Memory is often the resource that most directly constrains this choice
because each running container must stay below its own memory quota. A safe
initial size gives your application enough memory headroom while you collect
real metrics and validate the workload.

For simple applications, the default `M` container size is often a reasonable
starting point. Choose a larger size from the beginning when you already know
that your application has higher memory needs, for example because it uses a
memory-intensive runtime, high concurrency, large in-memory datasets, caches,
background jobs processing large payloads, or unknown production traffic.

See the [container sizes][container-sizes] page for the available sizes, memory
limits, and PID limits.


## Start Safe, Then Adjust

When you are unsure about the right size, start with a size that gives your
application enough headroom. After deployment, use [metrics][metrics],
[alerts][alerts], and realistic load testing to adjust the size.

Avoid choosing a smaller size only because the application starts successfully.
An application can boot with low memory usage and still consume much more
memory under real traffic, scheduled jobs, large requests, or specific user
flows.

Before downsizing, validate that the application keeps enough memory headroom
below the limit over time.


## Validate With Metrics and Load Testing

Before changing the container size, inspect the application charts in the
[Metrics tab][metrics]. Compare memory usage with the memory quota of the
selected container size, and also review CPU usage and application-level
signals.

Pay attention to:

- CPU usage.
- RAM and swap usage.
- Whether memory usage returns to a stable baseline after traffic peaks.
- Response time.
- 5xx errors.
- Restart events.

If production metrics are not enough to validate a size, test the application
with realistic load and non-sensitive data.

{% note %}
Before running intensive load tests against an application hosted on Scalingo,
read our [external testing procedures][external-testing].
{% endnote %}


## Match Capacity to Traffic

Once you have chosen the target size for each process type, use
[Scaling Your Application][scaling] to configure the expected capacity for your
traffic and workload.

If the application is critical or you are unsure about the safest sizing
strategy, contact Scalingo support.


## Monitor Resource Usage

Configure [alerts][alerts] for critical metrics, and keep
[notifiers][notifiers] configured so the right people receive notifications
before resource usage becomes critical.

If the application consumes all its available memory, it can be terminated by
the system. See the [Runtime Issues][oom-diagnosis] page for Out of Memory
crash diagnosis and recovery guidance.


[alerts]: {% post_url platform/app/2000-01-01-alerts %}
[container-sizes]: {% post_url platform/internals/2000-01-01-container-sizes %}
[external-testing]: {% post_url security/procedures/2000-01-01-external-testing %}#can-i-run-a-load-test-on-my-application-that-is-running-on-scalingo
[metrics]: {% post_url platform/app/2000-01-01-metrics %}
[notifiers]: {% post_url platform/app/2000-01-01-notifiers %}
[oom-diagnosis]: {% post_url platform/app/troubleshooting/2000-01-01-runtime-issues %}#out-of-memory-crashes
[scaling]: {% post_url platform/app/scaling/2000-01-01-scaling %}
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
---
title: Optimizing Application Architecture
nav: Optimized Architecture
modified_at: 2026-05-18 00:00:00
tags: app scaling architecture containers memory metrics performance concurrency
index: 5
---

An optimized application architecture makes efficient use of the resources
allocated to each container while keeping the application easy to operate. On
Scalingo, this usually means separating workloads into [process types][procfile],
keeping web requests short, tuning concurrency carefully, and using metrics to
decide when to optimize code, split work, or scale the application.

This page focuses on how to structure the application workload. To choose a
container size, see [Choosing a Container Size][choosing-container-size]. To
change the number or size of running containers, see
[Scaling Your Application][scaling].


## Design Around Process Types

Use [process types][procfile] to separate workloads that do not have the same
operational profile. A `web` process should handle HTTP requests and return
responses quickly. Background workers, schedulers, importers, exporters, and
other resource-intensive jobs should run in dedicated process types.

Favor focused processes that can be scaled independently, with a clear role
and a predictable resource profile. This is usually easier to operate than a
single large process that handles every workload.

This separation has several advantages:

- each workload can have its own number of containers;
- each workload can use a container size adapted to its resource profile;
- focused containers can start and scale faster;
- long or heavy tasks do not block request handling;
- worker concurrency can be tuned independently from web concurrency;
- incidents are easier to diagnose from metrics and logs.

Typical process types include:

- `web` for HTTP traffic;
- `worker` for background jobs;
- `clock` or `scheduler` for recurring jobs;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parler du cron?

- dedicated workers for heavy jobs such as PDF generation, image processing,
imports, exports, or batch tasks.

For long tasks triggered by a user request, return quickly from the web process
and process the work asynchronously. See [Long Running Process][long-process]
for the general pattern.


## Tune Concurrency Carefully

Concurrency lets a process handle more work in parallel, but each additional
thread, worker, or child process usually consumes more memory and may increase
database or external service pressure.

Tune concurrency separately for each process type:

- increase web concurrency only if response time and resource usage stay
healthy under realistic traffic;
- reduce worker concurrency if occasional jobs create memory pressure;
- keep enough database connections for the configured concurrency;
- use separate process types for jobs that have different resource profiles,
such as CPU-heavy and memory-heavy jobs.

Some runtimes expose Scalingo-specific or buildpack-provided defaults and
environment variables. See the language pages for details:
[Ruby][ruby], [Python][python], [PHP][php], [Java][java], [Node.js][nodejs],
and [Go][go].


## Handle Memory-Intensive Workloads

Memory pressure is often caused by specific workloads rather than by every
request. Check whether the application consumes more memory regularly, or only
during occasional tasks such as:

- background jobs;
- PDF generation;
- image or video processing;
- large imports or exports;
- report generation;
- scheduled batch tasks;
- large in-memory caches or datasets.

Depending on what you observe, prefer the smallest change that addresses the
actual cause:

- optimize the code path that allocates too much memory;
- tune runtime-specific memory settings;
- split heavy jobs into a dedicated process type;
- reduce worker or job concurrency;
- split large jobs into smaller chunks;
- isolate workloads that do not have the same resource profile.

If each container still needs more memory after these changes, continue with
[Choosing a Container Size][choosing-container-size].


## Size and Scale Your App

Once your application is optimized for its workload:

- [choose the right size][choosing-container-size] for each process type;
- [scale the application][scaling] to match capacity to traffic;
- read [Application Metrics][metrics] and configure [alerts][alerts] to monitor
the application after changes.

If the application reaches its memory limit and crashes, see [Runtime
Issues][oom-diagnosis] for diagnosis and recovery guidance.


[alerts]: {% post_url platform/app/2000-01-01-alerts %}
[choosing-container-size]: {% post_url platform/app/scaling/2000-01-01-choosing-container-size %}
[go]: {% post_url languages/go/2000-01-01-start %}
[java]: {% post_url languages/java/2000-01-01-start %}
[long-process]: {% post_url platform/app/2000-01-01-long-process %}
[metrics]: {% post_url platform/app/2000-01-01-metrics %}
[nodejs]: {% post_url languages/nodejs/2000-01-01-start %}
[oom-diagnosis]: {% post_url platform/app/troubleshooting/2000-01-01-runtime-issues %}#out-of-memory-crashes
[php]: {% post_url languages/php/2000-01-01-start %}
[procfile]: {% post_url platform/app/2000-01-01-procfile %}
[python]: {% post_url languages/python/2000-01-01-start %}
[ruby]: {% post_url languages/ruby/2000-01-01-start %}
[scaling]: {% post_url platform/app/scaling/2000-01-01-scaling %}
47 changes: 34 additions & 13 deletions src/_posts/platform/app/scaling/2000-01-01-scaling.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Scaling Your Application
nav: Scaling
modified_at: 2026-01-02 12:00:00
modified_at: 2026-05-18 00:00:00
index: 10
---

Expand Down Expand Up @@ -62,22 +62,40 @@ application to traffic fluctuations.

Here is a quick comparison table, in the context of a Platform as a Service:

| | Vertical Scaling | Horizontal Scaling |
| --------------- | --------------------------------------- | --------------------------------- |
| **Approach** | Enhancing individual instance capacity | Adding more instances |
| **Cost** | Can become expensive at higher limits | Often more cost-efficient |
| **Resilience** | Low (single point of failure) | High (distributed resources) |
| **Flexibility** | Low, limited by physical/virtual constraints | High, limited by the application architecture |
| **When** | Lack or overuse of CPU or RAM (swap increase or decrease) | Increase or decrease in total application traffic |
| | Vertical Scaling | Horizontal Scaling |
|-----------------|----------------------------------------------|---------------------------------------------------|
| **Approach** | Enhancing individual instance capacity | Adding more instances |
| **Cost** | Can become expensive at higher limits | Often more cost-efficient |
| **Resilience** | Low (single point of failure) | High (distributed resources) |
| **Flexibility** | Low, limited by physical/virtual constraints | High, limited by the application architecture |
| **When** | Lack or overuse of resources (CPU or RAM) | Increase or decrease in total application traffic |

### Memory Usage and Scaling Decisions

## Limitations
If your application is approaching its memory limit, check its
[memory metrics][metrics] and apply the same distinction between vertical and
horizontal scaling to the memory usage pattern.

- Vertical scaling is limited by the platform. The biggest container we can
currently boot is the `2XL` container, with 4GB of RAM. For a comprehensive
list of container sizes and corresponding specifications, please see our
Use vertical scaling when each container needs more memory to run safely. In
that case, choose a larger size with
[Choosing a Container Size][choosing-container-size]. Adding more containers
can help when memory usage increases because traffic increases and the workload
can be distributed across multiple containers. However, horizontal scaling will
not fix an application that individually requires more memory than the selected
container size provides.

If memory pressure comes from specific jobs or high concurrency, first review
the application structure with
[Optimizing Application Architecture][optimizing-architecture].


## Scaling Limits

- Vertical scaling currently goes up to the `2XL` container size, with 4GB of
RAM. For a comprehensive list of container sizes and corresponding
specifications, please see our
[dedicated documentation page]({% post_url platform/internals/2000-01-01-container-sizes %}).
- Horizontal scaling is limited by default to a maximum of 10 containers per
- Horizontal scaling is available by default up to 10 containers per
[process type]({% post_url platform/app/2000-01-01-procfile %}). This limit
can be increased via our support team.

Expand Down Expand Up @@ -233,3 +251,6 @@ To learn more about events and notifiers, please visit the page dedicated to

[routing-requests]: {% post_url platform/networking/public/2000-01-01-routing %}#requests-distribution
[Scalingo Autoscaler]: {% post_url platform/app/scaling/2000-01-01-scalingo-autoscaler %}
[choosing-container-size]: {% post_url platform/app/scaling/2000-01-01-choosing-container-size %}
[metrics]: {% post_url platform/app/2000-01-01-metrics %}
[optimizing-architecture]: {% post_url platform/app/scaling/2000-01-01-optimizing-application-architecture %}
Loading