The cost of your deployment is based on the hardware you select and the amount of time it executes for. Essentially, every time we run your code, we charge you. GPU, CPU and Memory usage are all charged per second while persistent storage is charged per GB per month. You can view the pricing of various compute on our pricing page

When you deploy a model, there are two processes we charge you for:

  1. We charge you for the build process in order to setup your model environment. This means we need to download your Python version, download and install Linux packages, Python packages and any model files you require. We only charge you for a model build once until you change your environment ie: add more packages - then we will charge you again since we rebuild the environment. This process happens every time you do ‘cerebrium deploy’.
  2. The model runtime. This is the amount of time it takes your code to run from start to finish. Remember, Cerebrium only charges you for runtime, your cold starts are free!

Let us go through an example of how you can calculate the cost of your deployment:

The model you wish to deploy requires:

  • 24 GB VRam (A5000): $0.000356 per second
  • 2 CPU cores: 2 * $0.0000532 per second
  • 20GB Memory: 20 * $0.00000659 per second
  • 10 GB persistent storage: 10 * $0.3 per month

In our situation, your model works on the first deployment and so you incur only one build process of 2 minutes. Additionally, let’s say that the model has an average runtime of 2 seconds. Let us calculate your expected cost at month end with you expecting to do 100 000 model inferences.

# Your variables
average_inference_time = 2 # seconds
number_of_inferences = 100000 # number of inferences per month

# cost calculation
total_build_compute_cost = build_seconds * (GPU_cost + CPU_cost*num_of_cpu_cores + memory_cost*gb_of_RAM )
total_inference_time = average_inference_time * number_of_inferences
inference_compute_cost = (GPU_cost + CPU_cost*num_of_cpu_cores + memory_cost*gb_of_RAM ) * total_inference_time
storage_cost = gb_of_persistent_storage * persistant_storage_cost

total_cost = inference_compute_cost + storage_cost + total_build_compute_cost

print(f"Build Compute cost: ${compute_cost :.2f}/month",
      f"Inference Compute cost: ${compute_cost :.2f}/month",
      f"\nStorage cost: ${storage_cost :.2f}/month",
      f"\nTotal cost: ${total_cost :.2f}/month")

From this, you can see that the cost of your deployment is:

Build Compute cost: $0.07/month
Inference Compute cost: $118.84/month
Storage cost: $3.00/month
Total cost: $121.21/month

Now you would expect the cost to grow linearly with the number of requests you get however, we want to help you grow your business and so we are happy to discount your computing cost as you grow. To see the discounts you can get for your volume, check out our pricing page or contact us directly.

As you grow, don’t worry about scaling your deployment with demand, sourcing new hardware or managing your infrastructure. You can focus on your code and we’ll handle the rest.