When you are developing a cortex deployment on Cerebrium, waiting for a build to complete can be time-consuming. To speed up your development process, you can use the cerebrium serve command to rapidly iterate on your deployment.

This allows you to run your deployment on a dedicated server, and see the results of your changes in a few seconds.

This feature is currently in beta and is available to all users. As such, you may encounter bugs or limitations. We are actively working on improving the experience and adding more features. If you have any feedback or suggestions, we’d love to hear from you on Discord or Slack!

Limitations:

  • Build process (packages, apt, etc.) changes require a full restart of the served instance
  • Serve sessions are not cached meaning a full build process is done even when environments have not changed between sessions
  • Reponses are not returned via the Local API server

Usage

To start a served instance, first navigate to the root folder of your cortex deployment.

Then, simply run the following command in your terminal:

cerebrium serve start

After running the cerebrium serve start command, it will start up an instance and create your environment. ie: install all your requested packages and dependancies.

Once completed, it will output a URL that you can use to query your instance locally - mimicing a production endpoint.

Info:   πŸ—οΈ  Starting served session...
πŸ†” Serve ID: p-abcd1234-8-cpu-only-a296
πŸ”„ Syncing files
⬆️  Uploading to Cerebrium...
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2.89k/2.89k [00:00<00:00, 68.2kB/s]
βœ… Resources uploaded successfully.
Info:   Session p-abcd1234-8-cpu-only-a296 started successfully

Info:   Local API server started successfully on port 7900
Send a POST request to: http://localhost:7900/predict

Local API Server

Since we are using a local API server, you don’t need to worry about providing an API key in the Authorization header. Your served instance will be running on port 7900 by default, however, you can change this by using the --port flag when you start a served instance

You can make a request to your local endpoint using:

curl -X POST http://localhost:7900/predict -H "Content-Type: application/json" --data '{"prompt": "this is my input data"}'

For the Beta, we don’t return the reponse back through the local API server but rather send the data to your instance. Please use the logs as a source of reference on output.

File Changes

As you save changes to your main.py or add/delete files in your directory, the instance will automatically update in a few seconds (unless some files you add are in the GB’s) and new changes will be live that you can inference.

If you would like to make changes to the environment, ie: hardware, pip/apt packages etc then you will need to restart the serve instance which you can do by pressing Ctrl+C and running the start command again.

Example changing code with serve

Please note that you are charged for your compute as long as serve is running. ie: If you are running serve for 8 minutes, you will be charged for 8 minutes of compute based on the hardware requirements you specified. It is very important to end your sesssion when done. We will automatically end the session after 10 minutes of inactivity.

How it works

When you run cerebrium serve start, the following happens:

  1. The cerebrium CLI uploads your deployment to a dedicated instance(s).
  2. The server builds your deployment in the same way as cerebrium deploy.
  3. The server starts your deployment and waits for you to send in requests.
  4. If you make changes to your main.py or other code in your deployment, the server reloads your deployment and applies your changes without rebuilding the entire deployment.
  5. When you’re done, you can stop the server by pressing Ctrl+C in the same terminal where you started the server.