You can use the Cerbrium + Vercel integration to access applications deployed on Cerebrium via REST Endpoints from Vercel projects. You’ll find the Cerebrium integration available to install in the Vercel AI marketplace.

What this integration does

This integration allows you to:

  1. Easily synchronize your Cerebrium API keys to one or more Vercel projects
  2. Call Cerebrium Endpoints over HTTP in connected Vercel projects


The integration will set the following environment variables against the user’s selected Vercel projects:


The environment variables will be set in the “preview” and “production” project targets. You can read more about environment variables within Vercel in the documentation.

Installing the integration

  1. Click “Add integration” on the Vercel integrations page
  2. Select the Vercel account you want to connect with
  3. (If logged out) Sign into an existing Cerebrium project, or create a new Cerebrium project
  4. Select the Vercel projects that you wish to connect to your Modal workspace
  5. Click “Continue” 6.Back in your Vercel dashboard, confirm the environment variables were added by going to your Vercel project > “Settings” > “Environment variables”

Uninstalling the integration

The Cerebrium Vercel integration is managed under the user’s Vercel dashboard under the “Integrations” tab. From there they can remove the specific integration installation from their Vercel account.

Important: removing an integration will delete the corresponding API token set by Modal in your Vercel project(s).


You can view our example here on how to deploy Mistral 7B with vLLM to an auto-scaling endpoint.

Once you have followed the example and deployed the application, you should have an output of the endpoint your application is deployed at. You can then deploy this within your vercel project as:

  "<YOUR PROJECT ID>/mistral-vllm/predict",
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.CEREBRIUM_JWT}`,
      "Content-Type": "application/json",
    body: JSON.stringify({
      prompt: "What is the capital city of France?",
  .then((response) => response.json())
  .then((data) => console.log(data))
  .catch((error) => console.error("Error:", error));

In this example, we built our application to take in a prompt as input and to return with the output of the mode.


Requests to applications use usage based pricing, billed at 1ms granularity. The exact cost per millisecond is based on the underlying hardware you specify.

See our pricing page for current GPU prices.