spaCy is an open-source software library for advanced natural language processing and excels at large-scale information extraction tasks By the end of this guide you’ll have a spaCy API endpoint that can handle any scale of traffic by running inference on serverless CPUs/GPUs.

Project set up

Before building you need to set up a Cerebrium account. This is as simple as starting a new Project in Cerebrium and copying the API key. This will be used to authenticate all calls for this project.

Create a project

  1. Go to
  2. Sign up or Login
  3. Navigate to the API Keys page
  4. You will need your private API key for deployments. Click the copy button to copy it to your clipboard


Prepare model

We are going to be creating a custom-named entity recognition (NER) model. This code could be replaced by any Spacy model. Make sure you have the required libraries installed - in this case, we are just using Spacy 3. First, let us create our training and evaluation data:

  "Who is Shaka Khan?",
    "entities": [[7, 17, "PERSON"]]
  "Who is Michael?",
    "entities": [[7, 14, "PERSON"]]

Spacy 3 no longer accepts JSON format so we have to convert it to their Spacy format

import srsly
import typer
import warnings
from pathlib import Path

import spacy
from spacy.tokens import DocBin

def convert(lang: str, input_path: Path, output_path: Path):
    nlp = spacy.blank(lang)
    db = DocBin()
    for text, annot in srsly.read_json(input_path):
        doc = nlp.make_doc(text)
        ents = []
        for start, end, label in annot["entities"]:
            span = doc.char_span(start, end, label=label)
            if span is None:
                msg = f"Skipping entity [{start}, {end}, {label}] in the following text because the character span '{doc.text[start:end]}' does not align with token boundaries:\n\n{repr(text)}\n"
        doc.ents = ents

if __name__ == "__main__":

Now run this script with the location of your training data and again with your evaluation data. !python en train.json train.spacy

When creating a Spacy model we can start with the standard template and edit it. You can run the following code to do this:
!python -m spacy init config --lang en --pipeline ner config.cfg --force

Lastly, we can train our model:
!python -m spacy train config.cfg --output ./training/ --paths.train train.spacy eval.spacy --training.eval_frequency 10 --training.max_steps 100 --gpu-id -1

Developing model

To start, you should install the Cerebrium framework by running the following command in your notebook or terminal

pip install --upgrade cerebrium

One of the main differences with a spaCy model deployed on Cerebrium, is it is mandatory to include a post-processing function which uses your trained model to manipulate text. This post-processing function has to return a list or string to be accepted. This is what will be returned to you via the API. Below, we show a very basic post-processing function and deploy our spaCy model.

When you train a spaCy model, it will create a folder with a tokenizer, meta.json, config.cfg etc. You will need to specify the path of this folder when deploying your model.

from cerebrium import Conduit, model_type

###doc is the model returned from spacy.load()
def postSpacy(doc):
    test = []
    for token in doc:
    return test

conduit = Conduit("spacy-model" , "<API_KEY>", (model_type.SPACY, "path/to/trained/model/", {"post": postSpacy}))

Deployed Model

Your model is now deployed and ready for inference all in under 10 seconds! Navigate to the dashboard and on the Models page, you will see your model.

You can run inference using curl

curl --location --request POST '<ENDPOINT>' \
--header 'Authorization: <API_KEY>' \
--header 'Content-Type: application/json' \
--data-raw '["This is us testing the Spacy model]'

The response will be:

Spacy Postman Response

Navigate back to the dashboard and click on the name of the model you just deployed. You will see an API call was made and the inference time. From your dashboard, you can monitor your model, roll back to previous versions and see traffic.

With one line of code, your model was deployed in seconds with automatic versioning, monitoring and the ability to scale based on traffic spikes. Try deploying your own model now or check out our other frameworks.