Streaming allows users to stream live output from their models using server-sent event (SSE) streams. This works for Python objects which implement the iterator protocol which is anything that essentially uses the ‘yield’ command in Python. You can return any content types as long as it is returned as a string

This feature is currently in beta and so if you would like to stream output, please replace ’predict’ in your endpoint url with ’stream

Let us see how we can implement a simple example below:

def predict(item, run_id, logger):
    for i in range(10):
        yield f"Number {i} "
        time.sleep(1)

Once you deploy this code snippet and hit the stream endpoint, you will see the SSE events progressively appear. The latest Postman has great functionality to show this.

Streaming

If you want to see an example of implementing this with Falcon-7b, please check out the example here