Tag: AI

  • Q-Learning in Python: Reinforcement Learning on Frozen Lake

    Q-Learning in Python: Reinforcement Learning on Frozen Lake

    Ever seen an AI agent go from stumbling around cluelessly to mastering its environment, making perfect moves every single time? In this blog post, we’ll explore how to train an agent to do just that, transforming random, chaotic actions into smooth, optimal choices. We’ll dive into the fascinating world of Q-learning and discover how it empowers AI agents to learn and adapt. In case you want to follow along, here is the link to the collab notebook.

    What Is Q-Learning ?

    Q-learning is a type of reinforcement learning where an agent learns to make optimal decisions by interacting with its environment. The agent explores its surroundings, tries different actions, and observes the outcomes. It uses a Q-table to store Q-values, which represent the expected reward for taking a specific action in a given state. Over time, the agent updates its Q-values based on its experiences, gradually learning the best actions to take in each situation.

    source: HuggingFace

    The Q-value update formula takes in our former estimate of the Q-value and then adds the temporal difference error, which is crucial for correctly adjusting our predictions based on new information. We multiply this value by a learning rate to take small, manageable steps, akin to the incremental updates we see in machine learning algorithms, allowing for gradual refinement of our estimates. The Temporal Difference Error is particularly significant as it comprises not just the immediate reward received from a given action, but also includes the discounted estimate of the optimal Q-value in the next state that our selected action will lead us into; this next step’s predicted value is critical as it influences our future decisions. This entire process is essential for the learning agent to adapt effectively to its environment, correction of biases in the initial Q-value estimates, and thus improving the overall decision-making strategy. By subtracting this former estimate of the Q-value from the combined factors, we arrive at a refined estimate that enhances the agent’s ability to predict and maximize long-term rewards in a dynamic setting.

    The Frozen Lake Environment

    Enough of theory, now it’s time to train our agent on the Frozen Lake Environment. Imagine a frozen lake with slippery patches. Our agent’s goal is to navigate across the lake without falling into any holes. The agent can move up, down, left, or right, but the slippery surface makes its actions unpredictable. This simple environment provides a great starting point for understanding Q-learning. We will go over the training on the non-slipper environment. To see how the agent performs in the slippery environment, you can see the YouTube video for this.

    The first thing we will have to do is to initialize the environment.

    # Importing libraries
    import gymnasium as gym
    import numpy as np
    from matplotlib import pyplot as plt
    
    np.set_printoptions(precision=3)
    
    env = gym.make('FrozenLake-v1', desc=None, map_name="4x4", is_slippery=False, render_mode="rgb_array")
    print(f"There are {env.action_space.n} possible actions")
    print(f"There are {env.observation_space.n} states")
    >>>There are 4 possible actions
    >>>There are 16 states
    

    We can see that our world is 4×4 in size and thus has 16 possible states and there are 4 possible actions – up, down, left and right. We can take a look at the world.

    The goal of our agent is to reach the prize at the bottom-right. We can clearly see that it can do so by either going right->right->down->down->down->right or by following down->down->right->right->down->right. But how do we train the agent to come up with either of these path on its own.

    We do so by initially letting the agent explore the environment randomly, trying different actions to see what happens, without any predefined strategy guiding its decisions. This phase of exploration is crucial, as it allows the agent to gather diverse experiences and build a foundational understanding of the environment’s dynamics. As it gains experience over time, it starts exploiting its learned knowledge, choosing actions with higher Q-values that have been identified as beneficial through previous trials. This shift from exploration to exploitation represents a significant turning point in the agent’s learning process, where it leverages its accumulated data to make more informed decisions. Throughout its journey, the agent balances exploration and exploitation to ensure it both discovers new strategies and utilizes its existing knowledge effectively. By continuously adjusting this balance, the agent enhances its performance, ultimately leading to more efficient learning and improved decision-making capabilities in complex scenarios.

    To do so let’s establish some helper functions first –

    def get_action(epsilon, state, q_table):
        if np.random.rand() < epsilon:
            return np.random.randint(0, env.action_space.n)
        else:
            return np.argmax(q_table[state])
    
    def get_td_error(state, next_state, action, reward, q_table):
        former_q_est = q_table[state,action]
        td_target = reward+ gamma*np.max(q_table[next_state])
        td_error = td_target - former_q_est
        return td_error
    
    # As seen, we first define the Q-table and during the training epochs we update this value. 
    q_table = np.zeros((env.observation_space.n, env.action_space.n))
    

    We created two functions, The first function, get_action, determines the action based on epsilon, which controls the randomness of our actions.. Initially during training we keep the epsilon very high and lower it as the agent learns. The second function, get_td_error, calculates the temporal difference error after each step. We also created our q-table which is a combination n_states x n_actions= 16×4.

    We also have to establish training hyper-parameters.

    num_epochs = 1000
    gamma = 0.99
    lr = 0.1
    decay_rate=0.99
    epsilon = 1
    

    During training, in each epoch we update our q-table after each action. The epoch is done if we either fall into the hole or get to the prize. After the episode is done we decay the epsilon a bit and repeat the process again. After the training is done our q-table should have converged to optimal q-values for each state-action pair.

    for i in range(num_epochs):
        state, _ = env.reset()
        done = False
        while not done:
            action = get_action(epsilon, state, q_table)
            next_state, reward, done, _, _ = env.step(action)
            td_error = get_td_error(state, next_state, action, reward, q_table)
            q_table[state, action] = q_table[state, action] + lr*td_error
            state = next_state
        epsilon*=decay_rate
    

    Now that we’ve trained our agent, let’s see how it’s action looks like. The code for creating the animation is in the collab notebook.

    We can see that it always now follows the optimal path.

    Conclusion

    Q-learning is a powerful technique for training AI agents to make optimal decisions. By interacting with their environment and learning from their experiences, agents can master even complex tasks. As we’ve seen, the environment plays a crucial role in shaping the agent’s behavior.

    However, in complex environments with a vast number of states, traditional Q-learning becomes impractical. That’s where deep Q-learning comes in. By using deep neural networks, we can approximate Q-values without relying on an enormous Q-table. Stay tuned for our next blog post, where we’ll explore the intricacies of deep Q-learning.

  • How Does a Mouse Find Cheese? | Bellman Equation Made Simple

    In the video we will explain how the Bellman Equation works in a deterministic world.

    Here is the code snippet you can use and run to verify the values of the state in the 3×3 grid world.

    def value_iteration(rewards, gamma=0.9, tolerance=1e-4, max_iterations=1000):
        # Initialize value matrix
        V = np.zeros_like(rewards, dtype=float)
        # Set terminal state values
        V[0, 2] = -1  # Cat state
        V[2, 2] = 1   # Cheese state
        
        for iteration in range(max_iterations):
            delta = 0  # Track maximum change
            V_prev = V.copy()  # Store previous values
            
            for i in range(3):
                for j in range(3):
                    # Skip terminal states
                    if (i == 0 and j == 2) or (i == 2 and j == 2):
                        continue
                        
                    # Get values of possible next states
                    possible_values = []
                    
                    # Check all possible moves (up, down, left, right)
                    # Up
                    if i > 0:
                        possible_values.append(V_prev[i-1, j])
                    # Down
                    if i < 2:
                        possible_values.append(V_prev[i+1, j])
                    # Left
                    if j > 0:
                        possible_values.append(V_prev[i, j-1])
                    # Right
                    if j < 2:
                        possible_values.append(V_prev[i, j+1])
                    
                    # Update value using Bellman equation
                    best_next_value = max(possible_values)
                    V[i, j] = rewards[i, j] + gamma * best_next_value
                    
                    # Update delta
                    delta = max(delta, abs(V[i, j] - V_prev[i, j]))
            
            # Check for convergence
            if delta < tolerance:
                print(f"Converged after {iteration + 1} iterations")
                break
        
        return V
    
    # Initialize rewards matrix
    rewards = np.zeros((3, 3))
    rewards[0, 2] = -1  # Cat state
    rewards[2, 2] = 1   # Cheese state
    
    # Run value iteration
    V = value_iteration(rewards, gamma=0.9)
    
    # Round the values for better readability
    np.set_printoptions(precision=3, suppress=True)
    print("\nFinal Value Function:")
    print(V)
    

  • Exploring Data Distribution Differences in Machine Learning: An Adversarial Approach

    Exploring Data Distribution Differences in Machine Learning: An Adversarial Approach

    First, a shout-out to Santiago, whose tweet inspired this post.

    In the realm of machine learning, ensuring that models perform well not only on training data but also on unseen test data is crucial. A common challenge that arises is the difference in data distribution between training and testing datasets, known as dataset shift. This discrepancy can significantly degrade the performance of a model when deployed in real-world scenarios. To tackle this issue, researchers and practitioners have developed various methods to detect and quantify differences in data distribution. One innovative approach is the adversarial method, which leverages concepts from adversarial training to assess and address these differences.

    Understanding Dataset Shift

    Before diving into the adversarial methods, it is essential to understand what dataset shift entails. Dataset shift occurs when the joint distribution of inputs and outputs differs between the training and testing phases. This shift can be categorised into several types, including covariate shift, prior probability shift, and concept shift, each affecting the model in different ways.

    • Covariate Shift: The distribution of input features changes between the training and testing datasets.
    • Prior Probability Shift: The distribution of the output variable changes.
    • Concept Shift: The relationship between the input features and the output variable changes.

    Detecting and correcting for these shifts is crucial for developing robust machine learning models.

    Adversarial Methods for Detecting Dataset Shift

    Adversarial methods for dataset shift detection are inspired by adversarial training in neural networks, where models are trained to be robust against intentionally crafted malicious input. Similarly, in dataset shift detection, these methods involve creating a scenario where a model tries to distinguish between training and testing data based on their data distributions.

    The way to do this is –

    1. Combine your train and test data.
    2. Create a new column, where you label training data as 1 and test data as 0.
    3. Train a classifier on this using your new column as the target.

    If the data in both train and test comes from the same distribution, the AUC will be close to 0.5, but if they are from different distributions, then the model will learn to differentiate the data points and the AUC will be close to 1.

    Example

    In this example, we will have training data as height and weight in metres and kilograms, and in the test data, we will have the same data but in centimetres and grams. Then if we train a simple logistic regression to learn on the dummy target, which is 1 on the training set and 0 on test data, given that we are not scaling the variables, the model should have an AUC close to 1.

    #Loading required libraries
    import numpy as np 
    import pandas as pd
    import seaborn as sns
    from sklearn.linear_model import LogisticRegression
    from sklearn.metrics import roc_auc_score
    from matplotlib import pyplot as plt
    

    Then we define our features for train and test

    # Set random seed for reproducibility
    np.random.seed(42)
    
    # Generate synthetic data
    # Training data (height in meters, weight in kilograms)
    train_height = np.random.normal(1.75, 0.1, 1000)  # Average height 1.75 meters
    train_weight = np.random.normal(70, 10, 1000)    # Average weight 70 kg
    
    # Test data (height in centimeters, weight in grams)
    test_height = train_height * 100  # Convert meters to centimeters
    test_weight = train_weight * 1000  # Convert kilograms to grams
    

    Once we’ve our features defined, all we need to do is create a training dataset, train our classifier and check the AUC score.

    # Combine data into feature matrices
    X_train = np.column_stack((train_height, train_weight))
    X_test = np.column_stack((test_height, test_weight))
    
    # Create labels: 1 for training data, 0 for test data
    y_train = np.ones(X_train.shape[0])
    y_test = np.zeros(X_test.shape[0])
    
    # Combine into a single dataset
    X = np.vstack((X_train, X_test))
    y = np.concatenate((y_train, y_test))
    
    # Train logistic regression model
    model = LogisticRegression()
    model.fit(X, y)
    
    # Predict probabilities for ROC AUC calculation
    y_pred_proba = model.predict_proba(X)[:, 1]
    
    # Calculate AUC
    auc = roc_auc_score(y, y_pred_proba)
    print(f"The AUC is: {auc:.2f}")
    
    

    The AUC here comes out to be 1.0 as expected. Since the train and test data comes from different distributions, the model was easily able to identify the difference in the distribution between train and test.

    Using this approach you can also easily test whether the train and test data come from the same distribution.

  • Build Fully Local RAG Application with LLaMA 3: A Step-by-Step Guide

    Build Fully Local RAG Application with LLaMA 3: A Step-by-Step Guide

    Meta just launched Llama 3 and its the best open source LLM you can use. So why not build a RAG Application using it. You can use the model for text-generation using either HuggingFace or Ollama, we will be using Ollama to create a RAG application which will run locally.

    In this tutorial, we will build a Retrieval Augmented Generation(RAG) Application using Ollama and Langchain. For the vector store, we will be using Chroma, but you are free to use any vector store of your choice.

    In case you just want the collab notebook, it’s available here.

    There are 4 key steps to building your RAG application –

    1. Load your documents
    2. Add them to the vector store using the embedding function of your choice.
    3. Define your prompt template.
    4. Deinfe your Retrieval Chatbot using the LLM of your choice.

    First we load the required libraries.

    # Loading required libraries
    import os
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    from langchain_community.document_loaders import PyPDFLoader
    from langchain_community.vectorstores import Chroma
    from langchain.chains import RetrievalQA
    from langchain.memory import ConversationSummaryMemory
    from langchain_openai import OpenAIEmbeddings
    from langchain.prompts import PromptTemplate
    from langchain.llms import Ollama

    Then comes step 1 which is to load our documents. Here I’ll be using Elden Ring Wiki PDF, you can just visit the Wikipedia page and download it as a PDF file.

    data_path = "./data/Elden_Ring.pdf"
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=2000,
        chunk_overlap=30,
        length_function=len,)
    documents = PyPDFLoader(data_path).load_and_split(text_splitter=text_splitter)

    In case you want to learn in detail about ChromaDB, you can visit our detailed guide to using ChromaDB. The next step is to use an embedding function that will convert our text into embeddings. I prefer using OpenAI embeddings, but you can use any embedding function. Using this embedding function we will add our documents to the Chroma vector database.

    embedding_func = OpenAIEmbeddings(api_key=os.environ.get("OPENAI_API_KEY"))
    vectordb = Chroma.from_documents(documents, embedding=embedding_func)

    Moving on, we have to define a prompt template. I’ll be using the mistral model, so its a very basic prompt template that mistral provides.

    template = """<s>[INST] Given the context - {context} </s>[INST] [INST] Answer the following question - {question}[/INST]"""
    pt = PromptTemplate(
                template=template, input_variables=["context", "question"]
            )

    All that is left to do is to define our memory and Retrieval Chatbot using Ollama as the LLM. To use Llama 3 as the LLM, all you have to do is define “llama3” as the model name.

    rag = RetrievalQA.from_chain_type(
                llm=Ollama(model="mistral"),
                retriever=vectordb.as_retriever(),
                memory=ConversationSummaryMemory(llm = Ollama(model="mistral")),
                chain_type_kwargs={"prompt": pt, "verbose": True},
            )
    rag.invoke("What is Elden Ring ?")
    >>> {'query': 'What is Elden Ring ?',
     'history': '',
     'result': ' Elden Ring is a 2022 action role-playing game developed by FromSoftware. It was published for PlayStation 4, PlayStation 5, Windows, Xbox One, and Xbox Series X/S. In the game, players control a customizable character on a quest to repair the Elden Ring and become the new Elden Lord. The game is set in an open world, presented through a third-person perspective, and includes several types of weapons and magic spells. Players can traverse the six main areas using their steed Torrent and discover linear hidden dungeons and checkpoints that enable fast travel and attribute improvements. Elden Ring features online multiplayer mode for cooperative play or player-versus-player combat. The game was developed with inspirations from Dark Souls series, and contributions from George R.R. Martin on the narrative and Tsukasa Saitoh, Shoi Miyazawa, Tai Tomisawa, Yuka Kitamura, and Yoshimi Kudo for the original soundtrack. Elden Ring received critical acclaim for its open world, gameplay systems, and setting, with some criticism for technical performance. It sold over 20 million copies and a downloadable content expansion, Shadow of the Erdtree, is planned to be released in June 2024.'}

    In sum, building a Retrieval Augmented Generation (RAG) application using the newly released LLaMA 3 model, Ollama, and Langchain enables robust local solutions for natural language queries. This tutorial walked you through the comprehensive steps of loading documents, embedding them into a vector store like Chroma, and setting up a dynamic RAG application that retrieves and generates responses efficiently. By harnessing the power of the newly released LLaMA 3 by Meta as the LLM and Langchain to create the chatbot, you can create intelligent systems that significantly enhance user interaction and information retrieval. The capabilities demonstrated here illustrate just a fraction of the potential applications. Let me know in the comments if you want me to cover something else.

  • Ultimate Guide to Chroma Vector Database: Everything You Need to Know – Part 2

    Ultimate Guide to Chroma Vector Database: Everything You Need to Know – Part 2

    In Part 1, we learned how to create the vector database and add documents to a collection. In this tutorial, we will learn how you can query the collection, upsert documents, delete individual documents and also the collection.

    Querying

    Now you can either peek at the collection, which will return you the first 10 documents in the collection, you can also specify the number of documents to peek at, or you can specify either the metadata or the ID you want to retrieve.

    collection.peek(5) # Returns the top 5 documents
    collection.get(ids=['pdf_chunk_0', 'pdf_chunk_1']) # Returns the documents corresponding to ids mentioned in the list

    You can also query a collection using the where method, where you can specify metadata. For example, in Part 1 we added metadata to each document, where the file name was reasearch_paper. So we can query all documents with the metadata.

    collection.get(where={'file_name': 'reasearch_paper'})

    Another thing you can do is query the most similar documents to an input query. For example, I want to know in the research paper who the authors are, I can get the documents which may contain this information by running –

    collection.query(query_texts=["Who are the authors of the paper ?"], n_results=3)

    Here the query texts are my queries and n_results is the number of similar documents I want for the query. You can specify multiple queries at the same time. In that case, it will return results for each query at the same time.

    Upserting

    Similar to querying, you can upsert providing the IDs. So for example I want to upsert the data in ID pdf_chunk_0, then I’ll run the following –

    collection.upsert(ids=['pdf_chunk_0'], documents=['This is an example of upsertion'])

    Now if I query the document, I should see the above document text instead of the original document. Note that if you provide an ID which is not present, ChromaDB will consider it as an add operation.

    Deleting

    Again you can delete individual documents by either specifying the IDs or using the where method. So in case I want to delete pdf_chunk_0, I can run this – collection.delete(ids = ['pdf_chunk_0']) or if I want to delete all documents containing some metadata, I can run this query – collection.delete(where={"file_name": "research_paper"})

    You can also delete the entire collection by client.delete_collection('research')

    In case you want to reset the client, and you’ve allowed so when creating the persistent client in the setting, you can run client.reset(). Empties and completely resets the database. ⚠️ This is destructive and not reversible.

    Let me know In case you want to learn more about ChromaDB, then I’ll create a guide for advanced users.

  • Ultimate Guide to Chroma Vector Database: Everything You Need to Know – Part 1

    Ultimate Guide to Chroma Vector Database: Everything You Need to Know – Part 1

    In this tutorial, we will walk through how to use Chromadb as your vector database for all your Retrieval-Augmented Generation (RAG) tasks.

    But before that, you need to install Chromadb, if you’re using Python then all you need to do is –

    pip install chromadb

    Now that you’ve installed Chromadb, let’s begin. We will use a PDF file as an example. For the PDF we will be using this research paper, but feel free to use the PDF of your choice.

    The first step is to create a persistent client, i.e., the storage which can be used at multiple places. While creating the client remember to add the setting to allow resetting the client should you require this functionality.

    import chromadb
    
    client = chromadb.PersistentClient(
                path="<path of persistent storage>", settings=chromadb.config.Settings(allow_reset=True)
            )

    Once you’ve a client set up then you can define collections within it. If you use BigQuery or any SQL products, imagine the client being the project and the collection being the dataset. Within the collection, you can store the documents as embeddings. In this example, we will call our collection as “research”. Also, one very important thing to remember is that each collection should have its own embedding function that has to be fixed. The query is also passed as an embedding when you try to search for the most similar documents. So in case you use embedding function X to add the documents and use embedding function Y to query them, then the similarity scores will not be correct, so this is a point to remember. We will be using the OpenAI ttext-embedding-3-small model. Another point to remember is that in a single document, you should only have as many tokens as the embedding function can embed. If say your embedding function is all-MiniLM-L6-v2 from HuggingFace, then the max sequence length that the function can handle is 256, so if you try to vectorise a file with longer context, then it will just clip the document to 256 tokens and embed that. The model from OpenAI has a longer max sequence length, but how much exactly is hard to find.

    # Defining the embedding function
    embedding_func = embedding_functions.OpenAIEmbeddingFunction(api_key=os.environ.get("OPENAI_API_KEY") , 
                                                                 model_name="text-embedding-3-small")
    

    Creating the collection, it’s best practice to specify the embedding function while creating the collection, otherwise, Chromadb uses a default embedding function. Chromadb will use sentence-transformer as a default if no embedding function is supplied.

    Now we will need to add a document to this collection, for this, we will use some helper functions from langchain.

    from langchain.text_splitter import RecursiveCharacterTextSplitter
    from langchain_community.document_loaders import PyPDFLoader
    
    data_path = "./data/2311.04635v1.pdf"
    
    pdf_loader = PyPDFLoader(data_path)
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=50,
        separators=["\n\n", "\n", ". ", " ", ""],
    )
    documents = pdf_loader.load_and_split(text_splitter=text_splitter)

    The PyPDFLoader will help to load the PDF file and the RecursiveCharacterTextSplitter will help in splitting the PDF into chunks. We are using a chunk size of 1000 with an overlap of 50, meaning the chunk sizes will be roughly 1000 tokens with overlapping text of 50 tokens. You can learn more about how the text splitter works here.

    Now that we have our documents loaded, time to add them to the Chromadb collection. Since we’ve specified the embedding function in the collection already, we can simply add the text files as embeddings. You have to specify “ids”, and think of them as table names in SQL, also you can specify metadata for each document, both are useful when you want to upsert or delete documents in the collection.

    collection.add(
                documents=[i.page_content for i in documents],
                ids=[f"pdf_chunk_{i}" for i in range(len(documents))],
                metadatas=
                [
                    {
                        "file_name": "reasearch_paper",
                        "timestamp": datetime.now(timezone.utc).isoformat(),
                    }
                    for _ in documents
                ],
            )

    Here we use the page contents of the loaded documents as documents, since they are text, the collection using its embedding function will automatically convert them as embeddings. In case you already have embeddings, you can directly add them as embeddings. I’ve also specified some ids, these are very rudimentary here for illustration purposes. Also, I’ve specified some metadata for each document. Both of these will be used to query, upsert or delete individual documents from the vector database.

    Read more about how you can upsert documents, query a collection and delete individual documents in Part II of this Ultimate Guide to ChromaDB

  • Deep Cross Networks Explained – An Evolution of Feed Forward Networks

    Deep Learning has revolutionized various sectors lately. One critical component of this revolution is the emergence of the Deep Cross Network (DCN). DCN is a novel type of neural network that significantly deviates from the traditional feed-forward networks to offer more robust and efficient solutions. This article aims to provide an in-depth understanding of the Deep Cross Network, its differences from the traditional feed-forward networks, and the areas of its application.

    Understanding Deep Cross Network (DCN)

    The Deep Cross Network (DCN) is a sophisticated hybrid model that combines the strengths of deep neural networks (DNNs) and feature crossing. It was introduced to handle high-dimensional sparse data more efficiently. It’s a mix of deep learning for non-linear input-output mappings and feature crossing for capturing some form of interaction between the feature dimensions.

    The core idea behind DCN is to apply explicit and efficient feature crossing in an input space. This is done by using a cross network that applies a cross operation on the input features to learn explicit bounded-degree feature interactions, which is then combined with a deep network that models arbitrary interactions.

    How DCN Differs From Feed-Forward Networks

    Feed-forward networks or Multilayer Perceptrons (MLPs) are the simplest type of artificial neural network. In these networks, data moves in one direction—from the input layer, through the hidden layers, and finally to the output layer. There is no looping or cycling back of data.

    DCN, on the other hand, doesn’t strictly adhere to this one-way flow. Instead, it allows the explicit feature crossing, which essentially enables the model to learn certain feature interactions and feed them back into the model. This combination of explicit and bounded-degree feature crossing with deep networks gives DCN its unique strength.

    Another notable difference lies in the complexity and efficiency of the two models. While feed-forward networks can become computationally expensive and complex as they increase in size and depth, DCN manages to handle high-dimension sparse input effectively and efficiently, thanks to its unique architecture.

    Another distinguishing feature is the ability of DCN to model feature interactions. While standard feed-forward networks can struggle to capture intricate feature interactions without substantial depth, DCN excels at learning both low- and high-order feature interactions effectively.

    Applications of Deep Cross Network

    The Deep Cross Network has numerous applications across various domains. Some of the most prevalent are:

    1. Recommendation Systems: DCN can effectively handle high-dimensional data and capture complex feature interactions, making it suitable for recommendation systems. It can model the interactions between users and items efficiently to provide accurate recommendations.
    2. Advertisement Click Prediction: DCN’s ability to capture high-order feature interactions makes it a perfect fit for predicting advertisement clicks. By understanding the intricate relationships between user behavior, ad characteristics, and context, it can predict the likelihood of a user clicking on an ad.
    3. Fraud Detection: In banking and finance, DCN can be used for fraud detection by effectively modeling the complex relationships between various transactions.
    4. Natural Language Processing: DCN can also be applied to various NLP tasks, such as sentiment analysis or text classification, where it can learn effective feature interactions from high-dimensional text data.

    Conclusion

    The Deep Cross Network is a significant breakthrough in the field of deep learning. Its unique combination of deep networks and feature crossing distinguishes it from traditional feed-forward networks and makes it a powerful tool for handling high-dimensional sparse data.

    Let me know in the comments if you want to go over an application of Deep Cross Networks using an example dataset.

  • How does ChatGPT remember? LLM Memory Explained.

    In the fascinating world of conversational AI, the ability of systems like ChatGPT to remember and refer back to earlier parts of a conversation is nothing short of magic. But how does this seemingly simple act of recollection work under the hood? Let’s dive into the concept of memory in large language models (LLMs) and uncover the mechanisms that enable these digital conversationalists to keep track of our chats.

    The Essence of Memory in Conversational AI

    Memory in conversational AI systems is about the ability to store and recall information from earlier interactions. This capability is crucial for maintaining the context and coherence of a conversation, allowing the LLM to reference past exchanges and build upon them meaningfully. This also gives the appearance that the LLM has intelligence when in reality they are stateless and have no inbuilt memory.

    LangChain, a framework for building conversational AI applications, highlights the importance of memory in these systems. It distinguishes between two fundamental actions that a memory system needs to support: reading and writing.

    What happens is that the LLM is passed an additional context of memory in addition to your input as a prompt so that it can process the information as if it had all the context from the get-go.

    Building Memory into Conversational Systems

    The development of an effective memory system involves two key design decisions: how the state is stored and how it is queried.

    Storing: The Backbone of Memory

    Underneath any memory system lies a history of all chat interactions. These can range from simple in-memory lists to sophisticated persistent databases. Storage is simple, you can store all past conversations in a database. You can either store them as simple text documents or use a vector database and store them as embeddings.

    Querying: The Brain of Memory

    Storing chat messages is only one part of the equation. The real magic happens in the querying phase, where data structures and algorithms work together to present a view of the message history that is most useful for the current context. This might involve returning the most recent messages, summarizing past interactions, or extracting and focusing on specific entities mentioned in the conversation.

    Practical Implementation with LangChain

    Here we will take a look at one way to store memory using LangChain.

    from langchain.memory import ConversationBufferMemory

    memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

    Now you can attach this memory to any LLM chain and it will add the entire previous conversations as context to the LLM after each chain invoke. The advantage of using this kind of memory is that its simple to implement. The disadvantage is that in longer conversations you’re passing more tokens and the input prompt size explodes, meaning slower response and if you’re using paid models like GPT-4, then costs also increase.

    Conclusion

    The ability of systems like ChatGPT to remember past interactions is a cornerstone of effective chatbots. By leveraging sophisticated memory systems, developers can create applications that not only understand the current context but can also draw on previous exchanges to provide more coherent and engaging responses. As we continue to push the boundaries of what conversational AI can achieve, the exploration and enhancement of memory mechanisms will remain a critical area of focus.

  • Build RAG Application Using Ollama

    In this tutorial, we will build a Retrieval Augmented Generation(RAG) Application using Ollama and Langchain. For the vector store, we will be using Chroma, but you are free to use any vector store of your choice.

    There are 4 key steps to building your RAG application –

    1. Load your documents
    2. Add them to the vector store using the embedding function of your choice.
    3. Define your prompt template.
    4. Deinfe your Retrieval Chatbot using the LLM of your choice.

    In case you want the collab notebook, you can click here.

    First we load the required libraries.

    # Loading required libraries
    import os

    from langchain.text_splitter import RecursiveCharacterTextSplitter
    from langchain_community.document_loaders import PyPDFLoader
    from langchain_community.vectorstores import Chroma
    from langchain.chains import RetrievalQA
    from langchain.memory import ConversationSummaryMemory
    from langchain_openai import OpenAIEmbeddings
    from langchain.prompts import PromptTemplate
    from langchain.llms import Ollama

    Then comes step 1 which is to load our documents. Here I’ll be using Elden Ring Wiki PDF, you can just visit the Wikipedia page and download it as a PDF file.

    data_path = "./data/Elden_Ring.pdf"
    text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2000,
    chunk_overlap=30,
    length_function=len,)

    documents = PyPDFLoader(data_path).load_and_split(text_splitter=text_splitter)

    The next step is to use an embedding function that will convert our text into embeddings. I prefer using OpenAI embeddings, but you can use any embedding function. Using this embedding function we will add our documents to the Chroma vector database.

    embedding_func = OpenAIEmbeddings(api_key=os.environ.get("OPENAI_API_KEY"))
    vectordb = Chroma.from_documents(documents, embedding=embedding_func)

    Moving on, we have to define a prompt template. I’ll be using the mistral model, so its a very basic prompt template that mistral provides.

    template = """<s>[INST] Given the context - {context} </s>[INST] [INST] Answer the following question - {question}[/INST]"""
    pt = PromptTemplate(
    template=template, input_variables=["context", "question"]
    )

    All that is left to do is to define our memory and Retrieval Chatbot using Ollama as the LLM.

    rag = RetrievalQA.from_chain_type(
    llm=Ollama(model="mistral"),
    retriever=vectordb.as_retriever(),
    memory=ConversationSummaryMemory(llm = Ollama(model="mistral")),
    chain_type_kwargs={"prompt": pt, "verbose": True},
    )
    rag.invoke("What is Elden Ring ?")
    >>> {'query': 'What is Elden Ring ?',
    'history': '',
    'result': ' Elden Ring is a 2022 action role-playing game developed by FromSoftware. It was published for PlayStation 4, PlayStation 5, Windows, Xbox One, and Xbox Series X/S. In the game, players control a customizable character on a quest to repair the Elden Ring and become the new Elden Lord. The game is set in an open world, presented through a third-person perspective, and includes several types of weapons and magic spells. Players can traverse the six main areas using their steed Torrent and discover linear hidden dungeons and checkpoints that enable fast travel and attribute improvements. Elden Ring features online multiplayer mode for cooperative play or player-versus-player combat. The game was developed with inspirations from Dark Souls series, and contributions from George R.R. Martin on the narrative and Tsukasa Saitoh, Shoi Miyazawa, Tai Tomisawa, Yuka Kitamura, and Yoshimi Kudo for the original soundtrack. Elden Ring received critical acclaim for its open world, gameplay systems, and setting, with some criticism for technical performance. It sold over 20 million copies and a downloadable content expansion, Shadow of the Erdtree, is planned to be released in June 2024.'}

    We see that it was even able to tell us when Shadow of the Erdtree is planned to release for which I’m really excited about. Let me know in the comments if you want to cover anything else.

  • Create Your Own Vector Database

    In this tutorial, we will walk through how you can create your own vector database using Chroma and Langchain. With this, you will be able to easily store PDF files and use the chroma db as a retriever in your Retrieval Augmented Generation (RAG) systems. In another part, I’ll walk over how you can take this vector database and build a RAG system.

    # Importing Libraries

    import chromadb
    import os
    from chromadb.utils import embedding_functions
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    from langchain_community.document_loaders import PyPDFLoader
    from typing import Optional
    from pathlib import Path
    from glob import glob
    from uuid import uuid4

    Now we will define some variables –

    db_path = <path you want to store db >
    collection_name = <name of collection of chroma, it's similar to dataset>
    document_dir_path = <path where the pdfs are stored>

    Now, you also need to create an embedding function, I will use the OpenAI model in the embedding function as it’s very cheap and good but you can use open-source embedding functions as well. You’ll need to pass this embedding function every time you call the collection.

    embedding_func = embedding_functions.OpenAIEmbeddingFunction(
    api_key=<openai_api_key> ,
    model_name="text-embedding-3-small",
    )

    Now we need to initialise the client, we will be using a persistent client and create our collection.

    client = chromadb.PersistentClient(path=db_path)
    client.create_collection(
    name=collection_name,
    embedding_function=embedding_func,
    )

    Now let’s load our PDFs. To do this, first, we will create a text splitter and then for each PDF, load it and split it into documents, which will then be stored in the collection. You can use any chunk size you want, we will use 1000 here.

    chunk_size = 1000

    #Load the collection
    collection = client.get_collection(
    collection_name, embedding_function=embedding_func
    )
    text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size=chunk_size,
    chunk_overlap=20,
    length_function=len,
    )

    for pdf_file in glob(f"{document_dir_path}*.pdf"):
    pdf_loader = PyPDFLoader(pdf_file)
    documents = [
    doc.page_content
    for doc in pdf_loader.load_and_split(text_splitter=text_splitter)
    ]
    collection.add(
    documents=documents,
    ids=[str(uuid4()) for _ in range(len(documents))],
    )

    The collections require an id to be passed, you can pass any string value, here we are passing random strings, but you can, for example, pass the name of the file as id.

    Let me know in case you’ve any questions.