Falcon
Falcon is a 7B/13B/40B parameters causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora. It is made available under the Apache 2.0 license. To deploy Falcon, you can use the identifier below:
- Falcon 7b:
falcon-7b
Here’s an example of how to call the deployed endpoint:
Request Parameters
curl --location --request POST 'https://run.cerebrium.ai/falcon-7b-webhook/predict' \
--header 'Authorization: <API_KEY>' \
--header 'Content-Type: application/json' \
--data-raw '{
"prompt": "Hey! How are you?",
"max_length": 100,
"temperature": 0.9
"top_p": 1.0
"top_k": 10
"num_return_sequences": 1
"repetition_penalty": 2.0
}'
This is the Cerebrium API key used to authenticate your request. You can get it from your Cerebrium dashboard.
The prompt you would like Falcon to process.
The maximum number of tokens the model must generate. The default is 100.
The value used to control the randomness in the model’s predictions. Higher values will result in more random outputs while smaller values make the output more deterministic. The default is 1.0.
A parameter also known as nucleus sampling, it’s used to control the randomness by allowing the model to only consider a minimum number of tokens with highest probability that cumulatively add up to the specified top_p. The default is 0.0.
The maximum number of highest probability vocab tokens considered for each step during the generation of sequences. Reducing top_k will limit the number of output possibilities, resulting in more deterministic outputs. The default is 50.
The number of independently computed sequences to return. If set to more than 1, then the number of sequences generated will be num_return_sequences, each one independently computed. The default is 1.
parameter used in text generation to penalize repeated words or tokens in the generated text. This is a number greater than or equal to 1.
{
"run_id": "c3b02252-2771-4898-b46c-3eca73a4c346",
"run_time_ms": 12521.568059921265,
"message": "Ran successfully",
"result": [
{
"generated_text": "Hey! How are you?\nI'm fine.\nI hope you're doing well."
}
]
}
Response Parameters
A unique identifier for the run that you can use to associate prompts with webhook endpoints.
The amount of time in milliseconds it took to run your function. This is what you will be billed for.
Whether of not the response was successful
The result generated from Falcon
curl --location --request POST 'https://run.cerebrium.ai/falcon-7b-webhook/predict' \
--header 'Authorization: <API_KEY>' \
--header 'Content-Type: application/json' \
--data-raw '{
"prompt": "Hey! How are you?",
"max_length": 100,
"temperature": 0.9
"top_p": 1.0
"top_k": 10
"num_return_sequences": 1
"repetition_penalty": 2.0
}'
{
"run_id": "c3b02252-2771-4898-b46c-3eca73a4c346",
"run_time_ms": 12521.568059921265,
"message": "Ran successfully",
"result": [
{
"generated_text": "Hey! How are you?\nI'm fine.\nI hope you're doing well."
}
]
}