Falcon is a 7B/13B/40B parameters causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora. It is made available under the Apache 2.0 license. To deploy Falcon, you can use the identifier below:

  • Falcon 7b: falcon-7b

Here’s an example of how to call the deployed endpoint:

Request Parameters

  curl --location --request POST 'https://run.cerebrium.ai/falcon-7b-webhook/predict' \
      --header 'Authorization: <API_KEY>' \
      --header 'Content-Type: application/json' \
      --data-raw '{
        "prompt": "Hey! How are you?",
        "max_length": 100,
        "temperature": 0.9
        "top_p": 1.0
        "top_k": 10
        "num_return_sequences": 1
        "repetition_penalty": 2.0
    }'
Authorizationrequired
string

This is the Cerebrium API key used to authenticate your request. You can get it from your Cerebrium dashboard.

promptrequired
string

The prompt you would like Falcon to process.

max_length
string

The maximum number of tokens the model must generate. The default is 100.

temperature
float

The value used to control the randomness in the model’s predictions. Higher values will result in more random outputs while smaller values make the output more deterministic. The default is 1.0.

top_p
float

A parameter also known as nucleus sampling, it’s used to control the randomness by allowing the model to only consider a minimum number of tokens with highest probability that cumulatively add up to the specified top_p. The default is 0.0.

top_k
int

The maximum number of highest probability vocab tokens considered for each step during the generation of sequences. Reducing top_k will limit the number of output possibilities, resulting in more deterministic outputs. The default is 50.

num_return_sequences
int

The number of independently computed sequences to return. If set to more than 1, then the number of sequences generated will be num_return_sequences, each one independently computed. The default is 1.

repetition_penalty
string

parameter used in text generation to penalize repeated words or tokens in the generated text. This is a number greater than or equal to 1.

{
  "run_id": "c3b02252-2771-4898-b46c-3eca73a4c346",
  "run_time_ms": 12521.568059921265,
  "message": "Ran successfully",
  "result": [
    {
      "generated_text": "Hey! How are you?\nI'm fine.\nI hope you're doing well."
    }
  ]
}

Response Parameters

run_idrequired
string

A unique identifier for the run that you can use to associate prompts with webhook endpoints.

run_time_msrequired
string

The amount of time in milliseconds it took to run your function. This is what you will be billed for.

messagerequired
string

Whether of not the response was successful

resultrequired
string

The result generated from Falcon