Graceful Termination
Cerebrium runs in a shared, multi-tenant environment. To efficiently scale, optimize compute usage, and roll out updates, the platform continuously adjusts its capacity - spinning down nodes and launching new ones as needed. During this process, workloads are seamlessly migrated to new nodes. In addition, your application has its own metric-based autoscaling criteria that dictate when instances should scale or remain active, as well as handle instance shifting during new app deployments. Therefore, in order to prevent requests from ending prematurely when we mark app instances for termination, you need to implement graceful termination.Understanding Instance Termination
For both application autoscaling and our own internal node scaling, we will send your application a SIGTERM signal, as a warning to the application that we are intending to shut down this instance. For Cortex applications (Cerebriums default runtime), this is handled. On custom runtimes, should you wish to gracefully shut down, you will need to catch and handle this signal. Once at leastresponse_grace_period
has elapsed, we will send your application a SIGKILL signal, terminating the instance immediately.
When Cerebrium needs to terminate an contanier, we do the following:
- Stop routing new requests to the container.
- Send a SIGTERM signal to your container.
- Waits for
response_grace_period
seconds to elaspse. - Sends SIGKILL if the container hasn’t stopped
SIGTERM
, which can interrupt in-flight requests and cause 502 errors.
Example: FastAPI Implementation
For custom runtimes using FastAPI, implement thelifespan
pattern to respond to SIGTERM.
The code below tracks active requests using a counter and prevents new requests during shutdown. When SIGTERM is received, it sets a shutdown flag and waits for all active requests to complete before the application terminates.
Test SIGTERM handling locally before deploying: start your app, send SIGTERM with
Ctrl+C
, and verify you see graceful shutdown logs.