Streaming allows users to stream live output from their models using a server-sent event (SSE) stream.
This works for Python objects that use the iterator or generator protocol.Currently, your generator/iterator is required to yield data, as it will be sent downstream via the text/event-stream Content-Type.
You may still send data in JSON format and then can decode it appropriately.Let’s see how we can implement a simple example below:
Copy
Ask AI
import timedef run(upper_range: int): for i in range(upper_range): yield f"Number {i} " time.sleep(1)
Once you deploy this code snippet and hit the stream endpoint, you will see the SSE events progressively appear every second.You can do this as follows:
HTTP/1.1 200 OKcache-control: no-cachecontent-encoding: gzipcontent-type: text/event-stream; charset=utf-8date: Tue, 28 May 2024 21:12:46 GMTserver: envoytransfer-encoding: chunkedvary: Accept-Encodingx-envoy-upstream-service-time: 198995x-request-id: e6b55132-32af-96d7-a064-8915c4a42452data: Number 0...
Progressively, you will see the rest of the data stream every second:
Copy
Ask AI
...data: Number 1data: Number 2
The latest Postman also has great functionality to show this.If you want to see an example of implementing this with Falcon-7B, please check out the example here.