Streaming Endpoints
Streaming allows users to stream live output from their models using a server-sent event (SSE) stream. This works for Python objects that use the iterator or generator protocol.
Currently, your generator/iterator is required to yield
data, as it will be sent downstream via the text/event-stream
Content-Type.
You may still send data in JSON format and then can decode it appropriately.
Let’s see how we can implement a simple example below:
Once you deploy this code snippet and hit the stream endpoint, you will see the SSE events progressively appear every second.
You can do this as follows:
This should output:
Progressively, you will see the rest of the data stream every second:
The latest Postman also has great functionality to show this.
If you want to see an example of implementing this with Falcon-7B, please check out the example here.