Streaming Endpoints
Streaming allows users to stream live output from their models using server-sent event (SSE) streams. This works for Python objects which use the iterator or generator protocol.
Currently, your generator/iterator is required to yield data, as it will be sent downstream via the text/event-stream
Content-Type.
You may still send data in JSON format and then can decode it appropriately.
Let us see how we can implement a simple example below:
Once you deploy this code snippet and hit the stream endpoint, you will see the SSE events progressively appear every second.
You can do this as follows:
This should output:
Progressively, you will see the rest of the data stream in every second:
The latest Postman also has great functionality to show this.
If you want to see an example of implementing this with Falcon-7b, please check out the example here