Tag: Retrieval Augmented Generation

Build Fully Local RAG Application with LLaMA 3: A Step-by-Step Guide

Meta just launched Llama 3 and its the best open source LLM you can use. So why not build a RAG Application using it. You can use the model for text-generation using either HuggingFace or Ollama, we will be using Ollama to create a RAG application which will run locally.

In this tutorial, we will build a Retrieval Augmented Generation(RAG) Application using Ollama and Langchain. For the vector store, we will be using Chroma, but you are free to use any vector store of your choice.

In case you just want the collab notebook, it’s available here.

There are 4 key steps to building your RAG application –

Load your documents
Add them to the vector store using the embedding function of your choice.
Define your prompt template.
Deinfe your Retrieval Chatbot using the LLM of your choice.

First we load the required libraries.

# Loading required libraries
import os
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.memory import ConversationSummaryMemory
from langchain_openai import OpenAIEmbeddings
from langchain.prompts import PromptTemplate
from langchain.llms import Ollama

Then comes step 1 which is to load our documents. Here I’ll be using Elden Ring Wiki PDF, you can just visit the Wikipedia page and download it as a PDF file.

data_path = "./data/Elden_Ring.pdf"
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2000,
    chunk_overlap=30,
    length_function=len,)
documents = PyPDFLoader(data_path).load_and_split(text_splitter=text_splitter)

In case you want to learn in detail about ChromaDB, you can visit our detailed guide to using ChromaDB. The next step is to use an embedding function that will convert our text into embeddings. I prefer using OpenAI embeddings, but you can use any embedding function. Using this embedding function we will add our documents to the Chroma vector database.

embedding_func = OpenAIEmbeddings(api_key=os.environ.get("OPENAI_API_KEY"))
vectordb = Chroma.from_documents(documents, embedding=embedding_func)

Moving on, we have to define a prompt template. I’ll be using the mistral model, so its a very basic prompt template that mistral provides.

template = """<s>[INST] Given the context - {context} </s>[INST] [INST] Answer the following question - {question}[/INST]"""
pt = PromptTemplate(
            template=template, input_variables=["context", "question"]
        )

All that is left to do is to define our memory and Retrieval Chatbot using Ollama as the LLM. To use Llama 3 as the LLM, all you have to do is define “llama3” as the model name.

rag = RetrievalQA.from_chain_type(
            llm=Ollama(model="mistral"),
            retriever=vectordb.as_retriever(),
            memory=ConversationSummaryMemory(llm = Ollama(model="mistral")),
            chain_type_kwargs={"prompt": pt, "verbose": True},
        )
rag.invoke("What is Elden Ring ?")
>>> {'query': 'What is Elden Ring ?',
 'history': '',
 'result': ' Elden Ring is a 2022 action role-playing game developed by FromSoftware. It was published for PlayStation 4, PlayStation 5, Windows, Xbox One, and Xbox Series X/S. In the game, players control a customizable character on a quest to repair the Elden Ring and become the new Elden Lord. The game is set in an open world, presented through a third-person perspective, and includes several types of weapons and magic spells. Players can traverse the six main areas using their steed Torrent and discover linear hidden dungeons and checkpoints that enable fast travel and attribute improvements. Elden Ring features online multiplayer mode for cooperative play or player-versus-player combat. The game was developed with inspirations from Dark Souls series, and contributions from George R.R. Martin on the narrative and Tsukasa Saitoh, Shoi Miyazawa, Tai Tomisawa, Yuka Kitamura, and Yoshimi Kudo for the original soundtrack. Elden Ring received critical acclaim for its open world, gameplay systems, and setting, with some criticism for technical performance. It sold over 20 million copies and a downloadable content expansion, Shadow of the Erdtree, is planned to be released in June 2024.'}

In sum, building a Retrieval Augmented Generation (RAG) application using the newly released LLaMA 3 model, Ollama, and Langchain enables robust local solutions for natural language queries. This tutorial walked you through the comprehensive steps of loading documents, embedding them into a vector store like Chroma, and setting up a dynamic RAG application that retrieves and generates responses efficiently. By harnessing the power of the newly released LLaMA 3 by Meta as the LLM and Langchain to create the chatbot, you can create intelligent systems that significantly enhance user interaction and information retrieval. The capabilities demonstrated here illustrate just a fraction of the potential applications. Let me know in the comments if you want me to cover something else.

April 19, 2024

Ultimate Guide to Chroma Vector Database: Everything You Need to Know – Part 1
In this tutorial, we will walk through how to use Chromadb as your vector database for all your Retrieval-Augmented Generation (RAG) tasks.

But before that, you need to install Chromadb, if you’re using Python then all you need to do is –
```
pip install chromadb
```
Now that you’ve installed Chromadb, let’s begin. We will use a PDF file as an example. For the PDF we will be using this research paper, but feel free to use the PDF of your choice.

The first step is to create a persistent client, i.e., the storage which can be used at multiple places. While creating the client remember to add the setting to allow resetting the client should you require this functionality.
```
import chromadb

client = chromadb.PersistentClient(
            path="<path of persistent storage>", settings=chromadb.config.Settings(allow_reset=True)
        )
```
Once you’ve a client set up then you can define collections within it. If you use BigQuery or any SQL products, imagine the client being the project and the collection being the dataset. Within the collection, you can store the documents as embeddings. In this example, we will call our collection as “research”. Also, one very important thing to remember is that each collection should have its own embedding function that has to be fixed. The query is also passed as an embedding when you try to search for the most similar documents. So in case you use embedding function X to add the documents and use embedding function Y to query them, then the similarity scores will not be correct, so this is a point to remember. We will be using the OpenAI ttext-embedding-3-small model. Another point to remember is that in a single document, you should only have as many tokens as the embedding function can embed. If say your embedding function is all-MiniLM-L6-v2 from HuggingFace, then the max sequence length that the function can handle is 256, so if you try to vectorise a file with longer context, then it will just clip the document to 256 tokens and embed that. The model from OpenAI has a longer max sequence length, but how much exactly is hard to find.
```
# Defining the embedding function
embedding_func = embedding_functions.OpenAIEmbeddingFunction(api_key=os.environ.get("OPENAI_API_KEY") , 
                                                             model_name="text-embedding-3-small")
```
Creating the collection, it’s best practice to specify the embedding function while creating the collection, otherwise, Chromadb uses a default embedding function. Chromadb will use sentence-transformer as a default if no embedding function is supplied.

Now we will need to add a document to this collection, for this, we will use some helper functions from langchain.
```
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader

data_path = "./data/2311.04635v1.pdf"

pdf_loader = PyPDFLoader(data_path)
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=50,
    separators=["\n\n", "\n", ". ", " ", ""],
)
documents = pdf_loader.load_and_split(text_splitter=text_splitter)
```
The PyPDFLoader will help to load the PDF file and the RecursiveCharacterTextSplitter will help in splitting the PDF into chunks. We are using a chunk size of 1000 with an overlap of 50, meaning the chunk sizes will be roughly 1000 tokens with overlapping text of 50 tokens. You can learn more about how the text splitter works here.

Now that we have our documents loaded, time to add them to the Chromadb collection. Since we’ve specified the embedding function in the collection already, we can simply add the text files as embeddings. You have to specify “ids”, and think of them as table names in SQL, also you can specify metadata for each document, both are useful when you want to upsert or delete documents in the collection.
```
collection.add(
            documents=[i.page_content for i in documents],
            ids=[f"pdf_chunk_{i}" for i in range(len(documents))],
            metadatas=
            [
                {
                    "file_name": "reasearch_paper",
                    "timestamp": datetime.now(timezone.utc).isoformat(),
                }
                for _ in documents
            ],
        )
```
Here we use the page contents of the loaded documents as documents, since they are text, the collection using its embedding function will automatically convert them as embeddings. In case you already have embeddings, you can directly add them as embeddings. I’ve also specified some ids, these are very rudimentary here for illustration purposes. Also, I’ve specified some metadata for each document. Both of these will be used to query, upsert or delete individual documents from the vector database.

Read more about how you can upsert documents, query a collection and delete individual documents in Part II of this Ultimate Guide to ChromaDB
April 9, 2024

Tag: Retrieval Augmented Generation

Build Fully Local RAG Application with LLaMA 3: A Step-by-Step Guide

Ultimate Guide to Chroma Vector Database: Everything You Need to Know – Part 1