Since LangChain seems like a fairly powerful way to recursively call OpenAI LLMs, I wanted to understand how this dark magic worked. I came accross this gist by @virattt where he creates a simple chatbot to chat with a Facebook earnings PDF. This seemed like a good place to start.

I created my own adaptation that reproduces the simple chatbot, but this time talking with the Tesla Q1 2023 earnings deck. I then extend his example in the last section by showing what’s happening under the hood inside LangChain. Hopefully this will provide some intuition for how LangChain works and how powerful simple recursive prompting with a language model can be, particularly when combined with some outside tools (like a vector embedding database).

But before we dive in, let’s see what it can do.

We can ask it any question we suspect will be answerable in the Tesla 2023 Q1 earnings deck. For example:

bot.ask("What was Tesla's revenue in the latest quarter?")

To which we get this reply:

Tesla’s revenue in the latest quarter was $23.3 billion.

Or we can ask it:

bot.ask("What was net income in the latest quarter?")

And the bot tells us this:

Tesla’s net income in the latest quarter was $2.5B GAAP net income.

Or we can ask more open ended questions, like this one:

bot.ask("What progress was there on full self driving during the quarter?")

And we get this short statement:

Tesla enabled the latest FSD Beta software stack for highway driving in the latest quarter.

And if we want to know more about FSD progress:

bot.ask("How many miles have been driven on FSD to date?")

To which the bot responds:

Over 150 million miles.

Sometimes we get longer answers:

bot.ask("Can you describe how the energy business has been growing?")

Such as this response:

Energy storage deployments increased by 360% year-over-year in Q1 to 3.9 GWh, the highest level of deployments achieved due to the ongoing Megafactory ramp. Solar deployments increased by 40% year-over-year in Q1 to 67 MW.

All of these responses are simply rephrasings and synthesis of information that’s directly in the PDF. But what’s useful about it is the LLM is able to generate a natural language answer to the specific question asked, using the information we provide it.

Building the bot

Open In Colab

The code leading up to the point where we have our bot object is only ~30 lines. I encourage you to open it in Colab and play around.

First we import what we’ll need:

from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import ConversationalRetrievalChain
from langchain.text_splitter import CharacterTextSplitter
from langchain.llms import OpenAI

Next we load the PDF and split it into manageable chunks:

# Load the 2023 Q1 Tesla Quarterly update PDF
financial_report_pdf = "https://digitalassets.tesla.com/tesla-contents/image/upload/IR/TSLA-Q1-2023-Update"
loader = PyPDFLoader(financial_report_pdf)
documents = loader.load()

# Chunk the financial report
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

We’ll then need to set up our credentials. Here I use getpass to prompt you for the value, but if you’re running this locally, this could also come from an environment variable:

# You'll need to provide your OpenAI API key for computing the embeddings.
from getpass import getpass
OPENAI_API_KEY = getpass('Enter your OpenAI API key: ')

We can then embed all of the documents using OpenAI embeddings, and store these in an in-memory vector database. OpenAI offers an API that uses one of their LLMs to encode the text into a high dimensional embedding vector space. Text that discusses similar material will exist nearby (as measured by cosine distance) in the embedding space. The vector database handles finding text that is semantically related to our question, hopefully including text that has enough information to answer it. Then we’ll use OpenAI’s LLM again to generate an answer.

embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
# Save the financial report into the Chroma vector database
vectorstore = Chroma.from_documents(docs, embeddings)

We can then create the final chain:

# Create the chain
qa = ConversationalRetrievalChain.from_llm(
    llm=OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY), 
    retriever=vectorstore.as_retriever(),
    return_source_documents=True)

Which we can interact with using this little Chatbot class:

# Create a little chat bot class and an instance
class Chatbot:
  def __init__(self):
    self.chat_history = []
  def ask(self, question):
    result = qa({"question": question, "chat_history": self.chat_history})
    answer = result["answer"].strip()
    self.chat_history.append((question, answer))
    print("\n".join(answer))
bot = Chatbot()

What’s going on under the hood?

Essentially all the functionality of the bot is handled by the ConversationalRetrievalChain instance from LangChain. But I feel it’s instructive to know what’s really happening.

We start with a question and a chat history. I purposely make the question a followup question that can only be answered only in the context of the previous question. This way we can see how the model uses the chat history:

inputs = {
    "question": "How's that going?", 
    "chat_history": [("Where is the Cybertruck going to be manufactured?", "The Cybertruck will be manufactured at Gigafactory Texas.")]
}

We turn the history into a single piece of text:

hist_text = ""
for turn in inputs["chat_history"]:
  hist_text += f"\nHuman: {turn[0]}\nAssistant: {turn[1]}"
print(hist_text)

This gives us the following text:

Human: Where is the Cybertruck going to be manufactured?
Assistant: The Cybertruck will be manufactured at Gigafactory Texas.

First we want to combine our question and history so that followup questions can be rephrased into a standalone question. LangChain contains many internal prompts that facilitate this sort of action. Our chatbot uses this prompt internally:

_template = """
Given the following conversation and a follow up question, rephrase the
follow up question to be a standalone question.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""

from langchain.prompts.prompt import PromptTemplate
condense_question_prompt = PromptTemplate.from_template(_template)

Like above, we’re going to be using OpenAI’s LLM for the various tasks

llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)

We can then create a chain that will combine our question and history. Using our prompt from above, the LLM is asked to generate a new question based on the input question and our history text.

from langchain.chains.llm import LLMChain
question_generator = LLMChain(llm=llm, prompt=condense_question_prompt)

question = inputs["question"]
new_question = question_generator.run(question=question, chat_history=hist_text)
print(new_question)

You’ll notice it completely ignores our chat history because it’s irrelevant to the question at hand:

How is the manufacturing of the Cybertruck at Gigafactory Texas progressing?

Next we identify relevant documents for answering this question using Chroma

retriever = vectorstore.as_retriever()
retrieved_docs = retriever.get_relevant_documents(new_question)
print(len(retrieved_docs))

We received four relevant documents:

4

We then use a very simple template to render each document that just returns the page_content.

doc_prompt = PromptTemplate(input_variables=["page_content"], template="{page_content}")
new_inputs = {"question": new_question, "chat_history": hist_text}
def format_doc(doc):
  return doc_prompt.format(page_content=doc.page_content)
doc_strings = [format_doc(doc) for doc in retrieved_docs]
context_inputs = {'context': "\n\n".join(doc_strings), 'question': new_inputs["question"]}

Now we’re ready to answer the question. We ask for an answer that incorporates the context.

prompt_template = """Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}
Helpful Answer:"""
context_prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])

We make a chain using the OpenAI LLM and our prompt we just made. And then we use the LLM to get a continuation, hopefully with our answer.

llm_chain = LLMChain(llm=llm, prompt=context_prompt)
answer = llm_chain.predict(**context_inputs)
print(answer.strip())

And we can see, that even though we only asked, “How’s it going?” The LLM was able to use our rephrased question to find relevant documents and turn those into a pretty good answer:

The factory in Germany is currently producing over 5,000 vehicles per week.

Conclusion

I hope this illustrates that the task of creating a chatbot that can answer questions about a set of documents is really very straightforward, and mainly involves how the task can be broken down into a succession of prompts that the LLM will generate useful continuations for.