Deploy specialized services from Cerebrium’s partners with simplified configurations
min_replicas
and max_replicas
parameters to control the number of instancesreplica_concurrency
parameter determines how many concurrent requests each instance can handlecooldown
parameter to control how long instances remain active after processing requestshardware
section to control the instance type which affects performance and/or cost