How to stream LLM responses using AWS API Gateway WebSocket and Lambda
Nowadays LLMs are everywhere. Many tasks are getting automated using AI (LLM) models. Most of these use cases are chat based where you chat with the LLM and it responds back with answers. In these scenarios it becomes useful to have a streaming mechanism where the LLM can stream responses back to the user. In such cases, the client can connect to the LLM and the LLM can stream responses back to the client as and when they are ready. This is where Websockets come into play. Websockets provide a full-duplex communication channel over a single TCP connection. This allows the LLM to stream responses back to the client. In this post, I will explain how to stream LLM responses using AWS API Gateway Websockets and Lambda. We will use AWS API Gateway to create a Websocket API which will be used to stream responses from a backend LLM inference service , to the client. We will use AWS Lambda to process the LLM responses and send them to the client over the Websocket connection. Finally we will automate the deployment of the infrastructure using Terraform.
The GitHub repo for this post can be found here