The cost of your deployment is based on the hardware you select and the amount of time it executes for. Essentially, every time we run your code, or you specify that a machine should be running—we charge you. GPU, CPU, and Memory usage are all charged per second while persistent storage is charged per GB per month. You can view the pricing of various compute on our pricing page.

When you deploy a model, there are two processes we charge you for:

  1. We charge you for the build process where we set up your app environment. In this step, we set up a Python environment according to your parameters before downloading and installing the required apt packages, Conda and Python packages, as well as any model files you require. You are only charged for a build if we need to rebuild your environment, i.e., you have run a build or deploy command and have changed your requirements, parameters or code. Note that we cache each of the steps in a build so subsequent builds will cost substantially less than the first.

  2. The app runtime. This is the amount of time it takes your code to run from start to finish on each request. There are 3 costs to consider here:

  • Cold-start: This is the amount of time it takes to spin up a server(s), load your environment, connect storage, etc. This is part of the Cerebrium service and something we are working on every day to get as low as possible. We do not charge you for this!
  • Model initialization: This part of your code is outside of your function and only runs when your app incurs a cold start. You are charged for the amount of time it takes for this code to run. Typically this is loading a model into GPU RAM, loading python packages etc.
  • Function runtime: This is the code stored in your function and runs every time a request hits your endpoint

Let us go through an example of how you can calculate the cost of your deployment:

The model you wish to deploy requires:

  • 24 GB VRam (A10): $0.000306 per second
  • 2 CPU cores: 2 * $0.00000655 per second
  • 20GB Memory: 20 * $0.00000222 per second

In our situation, your app works on the first deployment and so you incur only one build process of 2 minutes. Additionally, let’s say that the app has 10 cold starts a day with an average initialization of 2 seconds and lastly and average runtime (predict) of 2 seconds. Let us calculate your expected cost at month-end with you expecting to do 100,000 model inferences.

# Your variables
average_initialization_time = 2
cold_starts_per_month = 300  # 10 a day for 30 days
average_inference_time = 2  # seconds
number_of_inferences = 100000  # number of inferences per month

GPU_cost = 0.000306  # per second
CPU_cost = 0.00000655  # per second per core
memory_cost = 0.00000222  # per second per GB
num_of_cpu_cores = 2
gb_of_RAM = 20
build_seconds = 120  # 2 minutes

# cost calculation
compute_rate = GPU_cost + (CPU_cost * num_of_cpu_cores) + (memory_cost * gb_of_RAM)

total_build_compute_cost = build_seconds * compute_rate
total_initialization_time = average_initialization_time * cold_starts_per_month
total_inference_time = average_inference_time * number_of_inferences

initialization_compute_cost = total_initialization_time * compute_rate
inference_compute_cost = total_inference_time * compute_rate
storage_cost = gb_of_persistent_storage * persistent_storage_cost

total_cost = inference_compute_cost + storage_cost + total_build_compute_cost + initialization_compute_cost

print(f"Build Compute cost: ${total_build_compute_cost :.2f}/month",
      f"Initialization Compute cost: ${initialization_compute_cost :.2f}/month",
      f"Inference Compute cost: ${inference_compute_cost :.2f}/month",
      f"\nStorage cost: ${storage_cost :.2f}/month",
      f"\nTotal cost: ${total_cost :.2f}/month")