Langchain Q&A YouTube

In this tutorial we will recreate a question-answering bot that can answer questions based on a YouTube video. We recreated the application built here by @m_morzywolek.

To see the final implementation you can view it here

Create Cerebrium Account

Before building you need to set up a Cerebrium account. This is as simple as starting a new Project in Cerebrium and copying the API key. This will be used to authenticate all calls for this project.

Create a project

  1. Go to dashboard.cerebrium.ai
  2. Sign up or Login
  3. Navigate to the API Keys page
  4. You should see an API key with the source “Cerebrium”. Click the eye icon to display it. It will be in the format: c_api_key-xxx


Basic Setup

It is important to think of the way you develop models using Cerebrium should be identical to developing on a virtual machine or Google Colab - so converting this should be very easy!

Let us create our requirements.txt file and adding the following packages:

pytube # For audio downloading

To use Whisper we also have to install ffmpeg and a few other packages as a linux package and therefore have to create another file to define it, pkglist.txt - this is to install all Linux based packages.


To start we need to create a main.py file which will contain our main Python code. This is a relatively simple implementation, so we can do everything in 1 file. We would like a user to send in a link of a YouTube video with a question and return to them the answer as well as the time segment of where we got that response. So let us define our request object.

from pydantic import BaseModel

class Item(BaseModel):
    url: str
    question: str

Above we use Pydantic as our data validation library and BaseModel is where Cerebrium keeps some default parameters like “webhook_url” that you can use for long-running tasks but do not worry about that functionality for this tutorial. Due to the way I have defined my Base Model, “url” and “question” are required parameters and so if they are not present in the request, the user will automatically receive an error message.

Convert Video to text

Below we will use the Whisper model from OpenAI in order to convert the video audio to text. We will then split the text into its phrase segments with its respective timings, so we know the exact source of where our model got the answer from.

import pytube
from datetime import datetime

model = whisper.load_model("small")

def store_segments(segments):
    texts = []
    start_times = []

    for segment in segments:
        text = segment["text"]
        start = segment["start"]

        # Convert the starting time to a datetime object
        start_datetime = datetime.fromtimestamp(start)

        # Format the starting time as a string in the format "00:00:00"
        formatted_start_time = start_datetime.strftime("%H:%M:%S")


    return texts, start_times

def predict(item, run_id, logger):
    item = Item(**item)

    video = pytube.YouTube(item.url)
    audio = video.streams.get_audio_only()
    fn = audio.download(output_path="/models/content/", filename= f"{video.title}.mp4")

    transcription = model.transcribe(f"/models/content/{video.title}.mp4")
    res = transcription["segments"]

    texts, start_times = store_segments(res)

Langchain Implementation

Below we will implement Langchain in order to use a vectorstore, where we will store all our video segments above, with a LLM, Flan-T5, hosted on Cerebrium in order to generate answers.

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores.faiss import FAISS
from langchain.chains import VectorDBQAWithSourcesChain
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.llms import CerebriumAI
import openai
import faiss

sentenceTransformer = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
os.environ["CEREBRIUMAI_API_KEY"] = "c_api_key-XXXXXXXXXXXX"

def create_embeddings(texts, start_times):
    text_splitter = CharacterTextSplitter(chunk_size=1500, separator="\n")
    docs = []
    metadatas = []
    for i, d in enumerate(texts):
        splits = text_splitter.split_text(d)
        metadatas.extend([{"source": start_times[i]}] * len(splits))
    return metadatas, docs

#add following to your predict function after texts, start_times = store_segments(res)
    metadatas, docs = create_embeddings(texts, start_times)
    embeddings = HuggingFaceEmbeddings()
    store = FAISS.from_texts(docs, embeddings, metadatas=metadatas)
    faiss.write_index(store.index, "docs.index")
    llm = CerebriumAI(
    chain = VectorDBQAWithSourcesChain.from_llm(llm=llm, vectorstore=store)

    result = chain({"question": item.question})

    return {"result": result}

Above we chunk our text segments and store tem in a FAISS vector store. In order to create the embeddings, we use an open-source model from Hugging Face. The reason it is better to run Whisper and an embeddings model on the same machine is you save 2 round trips of network requests which will save you above 600ms+ on average.

We then integrate langchain with a Cerebrium deployed endpoint in order to answer questions. Lastly, we return the results

To deploy the model to an A10 we use the following command:

cerebrium-deploy -n langchain -h A10 -k c_api_key-XXXXXXXXXXXXX

Once deployed, we can Mmake the following request:

curl --location --request POST 'https://run.cerebrium.ai/v1/dev-p-xxxxxx/langchain/predict' \
--header 'Authorization: dev-c_api_key-XXXXXXXXXXXX' \
--header 'Content-Type: application/json' \
--data-raw '{
    "url": "https://www.youtube.com/watch?v=UF8uR6Z6KLc&ab_channel=Stanford",
    "question": "How old was Steve Jobs when started Apple?"

We then get the following results:

  "run_id": "8959bfaa-f6c1-4445-980c-1ab469a4b878",
  "message": "success",
  "result": {
    "result": {
      "question": "How old was Steve Jobs when started Apple?",
      "answer": "20",
      "sources": ""
  "run_time_ms": 72109.55119132996