While Cerebrium’s default runtime works well for most app needs, teams sometimes need more control over their web server implementation. Using ASGI or WSGI servers through Cerebrium’s custom runtime feature enables capabilities like custom authentication, dynamic batching, frontend dashboards, public endpoints, and websockets.

Setting Up Custom Servers

Here’s a simple FastAPI server implementation that shows how custom servers work in Cerebrium:

from fastapi import FastAPI

app = FastAPI()

@app.post("/hello")
def hello():
    return {"message": "Hello Cerebrium!"}

@app.get("/health")
def health():
    return "Ok"

Configure this server in cerebrium.toml by adding a custom runtime section:

[cerebrium.runtime.custom]
port = 5000
entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "5000"]
healthcheck_endpoint = "/health"

[cerebrium.dependencies.pip]
pydantic = "latest"
numpy = "latest"
loguru = "latest"
fastapi = "latest"

The configuration requires three key parameters:

  • entrypoint: The command that starts your server
  • port: The port your server listens on
  • healthcheck_endpoint: The endpoint that confirms server health. If empty, will default to a TCP ping of your specified port

For ASGI applications like FastAPI, include the appropriate server package (like uvicorn) in your dependencies. After deployment, your endpoints become available at https://api.cortex.cerebrium.ai/v4/{project - id}/{app - name} /your/endpoint.

Our FastAPI Server Example provides a complete implementation.