Articles

Q-Learning in Python: Reinforcement Learning on Frozen Lake
Ever seen an AI agent go from stumbling around cluelessly to mastering its environment, making perfect moves every single time? In this blog post, we’ll explore how to train an agent to do just that, transforming random, chaotic actions into smooth, optimal choices. We’ll dive into the fascinating world of Q-learning and discover how it empowers AI agents to learn and adapt. In case you want to follow along, here is the link to the collab notebook.

What Is Q-Learning ?

Q-learning is a type of reinforcement learning where an agent learns to make optimal decisions by interacting with its environment. The agent explores its surroundings, tries different actions, and observes the outcomes. It uses a Q-table to store Q-values, which represent the expected reward for taking a specific action in a given state. Over time, the agent updates its Q-values based on its experiences, gradually learning the best actions to take in each situation.

source: HuggingFace

The Q-value update formula takes in our former estimate of the Q-value and then adds the temporal difference error, which is crucial for correctly adjusting our predictions based on new information. We multiply this value by a learning rate to take small, manageable steps, akin to the incremental updates we see in machine learning algorithms, allowing for gradual refinement of our estimates. The Temporal Difference Error is particularly significant as it comprises not just the immediate reward received from a given action, but also includes the discounted estimate of the optimal Q-value in the next state that our selected action will lead us into; this next step’s predicted value is critical as it influences our future decisions. This entire process is essential for the learning agent to adapt effectively to its environment, correction of biases in the initial Q-value estimates, and thus improving the overall decision-making strategy. By subtracting this former estimate of the Q-value from the combined factors, we arrive at a refined estimate that enhances the agent’s ability to predict and maximize long-term rewards in a dynamic setting.

The Frozen Lake Environment

Enough of theory, now it’s time to train our agent on the Frozen Lake Environment. Imagine a frozen lake with slippery patches. Our agent’s goal is to navigate across the lake without falling into any holes. The agent can move up, down, left, or right, but the slippery surface makes its actions unpredictable. This simple environment provides a great starting point for understanding Q-learning. We will go over the training on the non-slipper environment. To see how the agent performs in the slippery environment, you can see the YouTube video for this.

The first thing we will have to do is to initialize the environment.
```
# Importing libraries
import gymnasium as gym
import numpy as np
from matplotlib import pyplot as plt

np.set_printoptions(precision=3)

env = gym.make('FrozenLake-v1', desc=None, map_name="4x4", is_slippery=False, render_mode="rgb_array")
print(f"There are {env.action_space.n} possible actions")
print(f"There are {env.observation_space.n} states")
>>>There are 4 possible actions
>>>There are 16 states
```
We can see that our world is 4×4 in size and thus has 16 possible states and there are 4 possible actions – up, down, left and right. We can take a look at the world.

The goal of our agent is to reach the prize at the bottom-right. We can clearly see that it can do so by either going right->right->down->down->down->right or by following down->down->right->right->down->right. But how do we train the agent to come up with either of these path on its own.

We do so by initially letting the agent explore the environment randomly, trying different actions to see what happens, without any predefined strategy guiding its decisions. This phase of exploration is crucial, as it allows the agent to gather diverse experiences and build a foundational understanding of the environment’s dynamics. As it gains experience over time, it starts exploiting its learned knowledge, choosing actions with higher Q-values that have been identified as beneficial through previous trials. This shift from exploration to exploitation represents a significant turning point in the agent’s learning process, where it leverages its accumulated data to make more informed decisions. Throughout its journey, the agent balances exploration and exploitation to ensure it both discovers new strategies and utilizes its existing knowledge effectively. By continuously adjusting this balance, the agent enhances its performance, ultimately leading to more efficient learning and improved decision-making capabilities in complex scenarios.

To do so let’s establish some helper functions first –
```
def get_action(epsilon, state, q_table):
    if np.random.rand() < epsilon:
        return np.random.randint(0, env.action_space.n)
    else:
        return np.argmax(q_table[state])

def get_td_error(state, next_state, action, reward, q_table):
    former_q_est = q_table[state,action]
    td_target = reward+ gamma*np.max(q_table[next_state])
    td_error = td_target - former_q_est
    return td_error

# As seen, we first define the Q-table and during the training epochs we update this value. 
q_table = np.zeros((env.observation_space.n, env.action_space.n))
```
We created two functions, The first function, get_action, determines the action based on epsilon, which controls the randomness of our actions.. Initially during training we keep the epsilon very high and lower it as the agent learns. The second function, get_td_error, calculates the temporal difference error after each step. We also created our q-table which is a combination n_states x n_actions= 16×4.

We also have to establish training hyper-parameters.
```
num_epochs = 1000
gamma = 0.99
lr = 0.1
decay_rate=0.99
epsilon = 1
```
During training, in each epoch we update our q-table after each action. The epoch is done if we either fall into the hole or get to the prize. After the episode is done we decay the epsilon a bit and repeat the process again. After the training is done our q-table should have converged to optimal q-values for each state-action pair.
```
for i in range(num_epochs):
    state, _ = env.reset()
    done = False
    while not done:
        action = get_action(epsilon, state, q_table)
        next_state, reward, done, _, _ = env.step(action)
        td_error = get_td_error(state, next_state, action, reward, q_table)
        q_table[state, action] = q_table[state, action] + lr*td_error
        state = next_state
    epsilon*=decay_rate
```
Now that we’ve trained our agent, let’s see how it’s action looks like. The code for creating the animation is in the collab notebook.

We can see that it always now follows the optimal path.

Conclusion

Q-learning is a powerful technique for training AI agents to make optimal decisions. By interacting with their environment and learning from their experiences, agents can master even complex tasks. As we’ve seen, the environment plays a crucial role in shaping the agent’s behavior.

However, in complex environments with a vast number of states, traditional Q-learning becomes impractical. That’s where deep Q-learning comes in. By using deep neural networks, we can approximate Q-values without relying on an enormous Q-table. Stay tuned for our next blog post, where we’ll explore the intricacies of deep Q-learning.
November 9, 2024

From Certain to Uncertain | Stochastic Bellman Equation Made Easy

In the video below we will go over how to calculate value for a state when the actions are probabilistic.

If you wondered how do I get the values for all states, here is the code snippet for it.

import numpy as np
import matplotlib.pyplot as plt
from typing import List, Tuple

class StochasticGridWorld:
    def __init__(self, size: int = 3, gamma: float = 0.9):
        self.size = size
        self.gamma = gamma
        # Initialize states
        self.values = np.zeros((size, size))
        self.values[0, 2] = -1  # Cat
        self.values[2, 2] = 1   # Cheese
        
        # Track value history for convergence visualization
        self.value_history = {(i, j): [] for i in range(size) for j in range(size)}
        
        # Movement probabilities
        self.p_intended = 0.5  # Probability of moving in intended direction
        self.p_random = 0.5 / 4  # Split remaining probability among all directions
        
    def get_next_state(self, current_state: Tuple[int, int], 
                       action: Tuple[int, int]) -> Tuple[int, int]:
        """Calculate next state given current state and action"""
        next_i = current_state[0] + action[0]
        next_j = current_state[1] + action[1]
        
        # Check if next state is within grid
        if 0 <= next_i < self.size and 0 <= next_j < self.size:
            return (next_i, next_j)
        return current_state
    
    def get_possible_actions(self) -> List[Tuple[int, int]]:
        """Return all possible actions as (dx, dy)"""
        return [(0, 1), (0, -1), (1, 0), (-1, 0)]  # Right, Left, Down, Up
    
    def calculate_state_value(self, state: Tuple[int, int]) -> float:
        """Calculate value for a given state considering all actions"""
        if state == (0, 2) or state == (2, 2):  # Terminal states
            return self.values[state]
        
        max_value = float('-inf')
        actions = self.get_possible_actions()
        
        for action in actions:
            value = 0 # We know this as the immediate reward is 0
            # Intended movement
            next_state = self.get_next_state(state, action)
            value += self.p_intended * self.values[next_state]
            
            # Random movements
            for random_action in actions:
                random_next_state = self.get_next_state(state, random_action)
                value += self.p_random * self.values[random_next_state]
            
            value = self.gamma * value  # Apply discount factor
            max_value = max(max_value, value)
            
        return max_value
    
    def value_iteration(self, num_iterations: int = 100, 
                       threshold: float = 1e-4) -> np.ndarray:
        """Perform value iteration and store history"""
        for iteration in range(num_iterations):
            delta = 0
            new_values = np.copy(self.values)
            
            for i in range(self.size):
                for j in range(self.size):
                    if (i, j) not in [(0, 2), (2, 2)]:  # Skip terminal states
                        old_value = self.values[i, j]
                        new_values[i, j] = self.calculate_state_value((i, j))
                        delta = max(delta, abs(old_value - new_values[i, j]))
                        self.value_history[(i, j)].append(new_values[i, j])
            
            self.values = new_values
            
            # Check convergence
            if delta < threshold:
                print(f"Converged after {iteration + 1} iterations")
                break
        
        return self.values
    
    def plot_convergence(self):
        """Plot value convergence for each non-terminal state"""
        plt.figure(figsize=(12, 8))
        for state, history in self.value_history.items():
            if state not in [(0, 2), (2, 2)]:  # Skip terminal states
                plt.plot(history, label=f'State {state}')
        
        plt.title('Value Convergence Over Iterations')
        plt.xlabel('Iteration')
        plt.ylabel('State Value')
        plt.legend()
        plt.grid(True)
        plt.show()

# Run the simulation
grid_world = StochasticGridWorld()
final_values = grid_world.value_iteration(num_iterations=100)

print("\nFinal Values:")
print(np.round(final_values, 3))

October 30, 2024

How Does a Mouse Find Cheese? | Bellman Equation Made Simple

In the video we will explain how the Bellman Equation works in a deterministic world.

Here is the code snippet you can use and run to verify the values of the state in the 3×3 grid world.

def value_iteration(rewards, gamma=0.9, tolerance=1e-4, max_iterations=1000):
    # Initialize value matrix
    V = np.zeros_like(rewards, dtype=float)
    # Set terminal state values
    V[0, 2] = -1  # Cat state
    V[2, 2] = 1   # Cheese state
    
    for iteration in range(max_iterations):
        delta = 0  # Track maximum change
        V_prev = V.copy()  # Store previous values
        
        for i in range(3):
            for j in range(3):
                # Skip terminal states
                if (i == 0 and j == 2) or (i == 2 and j == 2):
                    continue
                    
                # Get values of possible next states
                possible_values = []
                
                # Check all possible moves (up, down, left, right)
                # Up
                if i > 0:
                    possible_values.append(V_prev[i-1, j])
                # Down
                if i < 2:
                    possible_values.append(V_prev[i+1, j])
                # Left
                if j > 0:
                    possible_values.append(V_prev[i, j-1])
                # Right
                if j < 2:
                    possible_values.append(V_prev[i, j+1])
                
                # Update value using Bellman equation
                best_next_value = max(possible_values)
                V[i, j] = rewards[i, j] + gamma * best_next_value
                
                # Update delta
                delta = max(delta, abs(V[i, j] - V_prev[i, j]))
        
        # Check for convergence
        if delta < tolerance:
            print(f"Converged after {iteration + 1} iterations")
            break
    
    return V

# Initialize rewards matrix
rewards = np.zeros((3, 3))
rewards[0, 2] = -1  # Cat state
rewards[2, 2] = 1   # Cheese state

# Run value iteration
V = value_iteration(rewards, gamma=0.9)

# Round the values for better readability
np.set_printoptions(precision=3, suppress=True)
print("\nFinal Value Function:")
print(V)

October 29, 2024

Master the GPT-4o API in Minutes with This Ultimate Guide!
Open AI just released their newest model GPT-4o with multimodal capabilities, which means it can process text, image and audio at the same time. It’s supposed to be 2x faster and half the price compared to GPT-4, also better than GPT-4 at most of the benchmarks.

First thing is that you need to have a developers account on OpenAI and have some credits there. To do so you can visit platform.openai.com. Once you’ve billing enabled, create an API key.

Link to Collab Notebook.

First we import the required libraries and specify the path of the image.
```
from google.colab import userdata
import base64
import requests

image_path = "<image path>"
```
Then we need to create a helper function to encode the image in base64 notation.
```
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')
```
If we take a look at the image, it’s a right-angled triangle with a missing angle, C, which, if you have done basic math, can be calculated to be 180-90-25 = 65 degrees.

To use the API, you have to create a header and a json payload.

In the head, pass the content type and as application/json and your API key in the authorisation,

In the payload, you have to specify the model, gpt-4o in our case. Under messages, you can specify the text and image. Note that you can pass multiple texts, and the model will consider both images to answer the question. Here, we are passing a single image with the question to find the missing angle and only return the answer.
```
# Getting the base64 string
base64_image = encode_image(image_path)

headers = {
  "Content-Type": "application/json",
  "Authorization": f"Bearer {userdata.get('OPENAI_API_KEY')}" #Pass API KEY here
}

payload = {
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Calculate the missing angle C in the image, only return the answer"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}"
          }
        }
      ]
    }
  ],
  "max_tokens": 200
}

response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
```
The max tokens specify the maximum completion tokens allowed. Let us see if GPT-4o gets this correct.

It returns 65 degrees as expected. However, this post was intended to give you an idea of how you can use GPT-4o in your development projects via API.
May 13, 2024

OpenELM – Apple Enters the Open Source LLM Race But Is it Any Good?

Not even a couple of days have passed since Microsoft’s release of Phi-3-mini which I tested in this blog post. Now Apple has joined the race of Open Source LLMs by release a class of LLMs under the name OpenELM. In this post I’ll guide how you can start using it, as the HuggingFace page does not describe the tokeniser to use. We will also compare it to Phi-3-mini 3.8B instruct. For the comparison, I’ll be using OpenELM 1.1B Instruct. I ran into some issues using the 3B instruct model.

Link to collab notebook.

How to Run OpenELM?

I’m sure that by reading the model page, you might wonder that the tokenizer is not specified, so how can I start using OpenELM using HuggingFace, well the fact, and this is one of the most apple things, that they have described in the paper the tokenizer they are using. It can be found in this link. Its using a llama 2 tokenizer.

So your setup will look something like this –

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
device = "cuda:0" # Specify device
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf", pad_token_id=0) # Load in the tokenizer
# Load the model
model = AutoModelForCausalLM.from_pretrained("apple/OpenELM-3B-Instruct",device_map=device,trust_remote_code= True)

Once you’ve the mode loaded, you need to write the generation code. One to prepare the tokens and the next to generate the text. The code provided by HuggingFace also uses an assistant model, so we will use the same model as assistant.

def prepare_prompt(prompt:str):
  tokens = tokenizer(prompt)
  tokenized_prompt = torch.tensor(
        tokens['input_ids'],
        device=device
    )
  return tokenized_prompt.unsqueeze(0)
def generate(prompt:str, model:AutoModelForCausalLM, max_length:int = 128):
  tokenized_prompt = prepare_prompt(prompt)
  output_ids = model.generate(
        tokenized_prompt,
        max_length=max_length,
        pad_token_id=0,
        assistant_model = model)
  output_text = tokenizer.decode(
        output_ids[0].tolist(),
        skip_special_tokens=True
    )
  return output_text

Now you’re set up for text generation using OpenELM.

Math Question

Find x and y from these equations. x-2y = 10 and 2x + y = 0. Be brief in your response.

Phi-3-mini Response

### Response

To solve the system of equations, we can use the substitution or elimination method. Here, we'll use the elimination method.


Given equations:

1) x - 2y = 10

2) 2x + y = 0


First, we'll manipulate the second equation to eliminate y:

Multiply the second equation by 2 to get:

4x + 2y = 0


Now, add this new equation to the first equation:

(x - 2y) + (4x + 2y) = 10 + 0

5x = 10

x = 10 / 5

x = 2


Now that we have x, we can substitute it back into either of the original equations to find y. We'll use the second equation:

2x + y = 0

2(2) + y = 0

4 + y = 0

y = -4


So, the solution is x = 2 and y = -4.

Phi-3-mini got the correct Answer.

OpenELM Response –

Find x and y from these equations. x-2y = 10 and 2x + y = 0. Be brief in your response.
Solution:
Find the following equations:
x-2y = 10
2x + y = 0
Solve the given equations for the given points:
Find:
x
y
Solution:
The given equations can be solved in the following form:
x-2y = -10
y = a + bx
Solve the given equations for the given points:
a = -10
b = -1
Solution:
y = -1 + bx
y = -2 + bx
The given solutions are:
y = -2 + bx and y = -1 + bx
Now, find the points a and b:
a = -1 + bx
bx = -2 + bx
Solution:
bx = -2 - a + b
bx = -2 - -1 + b
bx = -4
Find the points a and b:
a = -4 and b = -4
Solution:
a = -4 - b
a = 2
The given points a and b are:
a = 2 and b = 2

Good attempt, given it’s a 1.1 B parameter model, but wrong answer.

Pattern Question

"Fill in the blank letter '_' in the pattern 1,8,9,64,25,216,_. Only return the answer"

Phi-3-mini Response –

### Response

The pattern appears to be based on the sequence of cubes of natural numbers, but with alternating signs and skipping one number each time. The sequence is:

1^3 = 1
(-2)^3 = -8
3^3 = 27 (but given as 9, so it seems there's a pattern of subtracting 18)
(-4)^3 = -64
5^3 = 125 (but given as 64, so it seems there's a pattern of subtracting 61)
7^3 = 343 (but given as 25, so it seems there's a pattern of subtracting 318)

Following this pattern, the next number should be the cube of 8, but with the same subtraction pattern. Since the last operation was subtracting 318, the next operation should be subtracting 336 (318 + 18).

8^3 = 512
512 - 336 = 176

So the missing number is 176.

It inferred again that there some cubing, but did not arrive at the right solution, which is 49.

OpenELM 1.1B Response –

OpenELM completely went off the track, and its response makes no sense.

For example, given the pattern:

1,8,9,64,25,216,_.

To return the only match with '_':

_(1,8,9,64,25,216,_)

To return all matches with '_':

_(1_,8_,9_,64_,25_,216_,_)

To return all matches starting with '_':

_(_,8_,9_,25_,216_,216_,_)

To return all matches ending with '_':

_(_,9_,216_,216_,216_,_)

To return all matches containing '_':

_(_,8_,9_,25_,216_,216_,_)_

To return all matches starting with '_and_ ending with '_':

_(_,8_,9_,25_,216_,216_ _,)_

To return all matches starting with '_or_ ending with '_':

_(_,8_,9_,25_,216_,216|_)_

To return all matches starting with '_or_and_

SQL Question

Provide me an SQL query to find the oldest employee_id in every region in the table Employee.
The table has 3 columns - 1. employee_id - This is the primary key
2. joining_date - Date of joining of the employee
3. region - region of employee.

Phi-3-mini Response –

- Response: To find the oldest employee_id in every region from the `Employee` table, you can use the following SQL query. This query groups the results by the `region` column and then for each region, it finds the oldest `employee_id` based on the earliest `joining_date`.

```sql
SELECT 
    region,
    employee_id
FROM 
    Employee
WHERE 
    (region, joining_date) IN (
        SELECT 
            region, 
            MIN(joining_date)
        FROM 
            Employee
        GROUP BY 
            region
    )
ORDER BY 
    region;
```

This looks like a query that could work, although a more efficient solution could be given.

OpenELM 1.1 B Response –

```sql
SELECT employee_id
FROM "Employee"
WHERE 1 = (SELECT MAX(1)
          FROM "Employee"
          WHERE "Region" = 'REGION_' + region_name(region));

region_name('Europe/London')
```

Though it gives a straightforward SQL, the quality of its answer is way below that of Phi-3-mini. But that’s to be expected, given the difference in model size.

Logical Puzzle 1

A lot of humans get this wrong –

A bat and ball cost 1.10 and the difference between a bat and a ball is is 1. How much does each cost? Be brief in your response.

Phi-3-mini Response

I am very impressed that it got this right, just like the math puzzle, so seems like its a good LLM for math problems.

### Response

Let's denote the cost of the bat as B and the cost of the ball as C. We are given two pieces of information:

1. The total cost of the bat and the ball is $1.10:
   B + C = 1.10

2. The difference in cost between the bat and the ball is $1:
   B - C = 1

We can solve these two equations simultaneously to find the values of B and C.

Adding the two equations together, we get:
(B + C) + (B - C) = 1.10 + 1
2B = 2.10
B = 2.10 / 2
B = 1.05

Now that we have the cost of the bat, we can find the cost of the ball by substituting B back into one of the original equations:

B + C = 1.10
1.05 + C = 1.10
C = 1.10 - 1.05
C = 0.05

So, the bat costs $1.05 and the ball costs $0.05.

OpenELM 1.1B Response

Math is definitely not its strong suite.

A bat and ball cost 1.10 and the difference between a bat and a ball is is 1. How much does each cost? Be brief in your response.

A bat: B - 1, B_Ball - 1.10, B_Diff - 1

A ball: B_Ball - 1.10, B_Diff - 1, B_Cost - 1

Solution:

A bat cost B = B_Ball + B_Diff = B_Cost + 1

A ball cost B_Ball = B_Cost + B_Diff = B_Cost + 2

Solution:

A bat cost B = 2 = 2 + 1

A ball cost B_Ball = B_Cost + B_Diff = 2 + 3

Solution:

A bat cost B = 2 + 3 = 4

A ball cost B_Ball = B_Cost + B_Diff = 4 + 3

Solution:

A bat cost B = 4 = 4 + 2 + 3

A bat + ball cost = 4 + 4 = 8

8 = 8 * 2 + 4 * 3 + 2 * 4 + 3 * 4 + 4 *

Logical Puzzle 2

It takes 5 machines 5 minutes to make 5 widgets. How long does it take 100 machines to make 100 widgets? Be brief in your response.

To us the answer is fairly obvious that it will be 5.

Phi-3-mini Response

It gets this right again, very strong in easy logical tasks.

### Response

It still takes 5 minutes for 100 machines to make 100 widgets. The number of machines and widgets scales linearly, so the time remains the same.

OpenELM 1.1B Response

Well it tries to reason with itself, and arrives at some solution using a convoluted logic, but gets the wrong answer.

It takes 5 machines 5 minutes to make 5 widgets. How long does it take 100 machines to make 100 widgets? Be brief in your response.

It takes 5 machines 5 minutes to make 5 widgets. How long does it take 100 machines to make 100 widgets:

1. 100 machines = 100 machines (i, j) = (1 + i + j - 1) machines
2. 5 minutes = 60 seconds = 1 i-step (i = 1, 2, ..., n-1)
3. 5 minutes / 60 seconds = 8.33333 i-steps per minute (i = 1 to n-1)
4. 8.33333 i-steps per minute * 100 machines = 83.3333 seconds (i = 1 to 100)
5. Multiply by 100:
   - 83.3333 seconds / 100 machines = 8.3333 minutes (m = 1 to 100)

The answer is:

100 machines = 8 minutes (m = 1 to 100)

So, it takes 8 minutes to make 100 widgets using 100 machines.

Conclusion

It’s wonderful to witness Apple’s support for the open-source community by releasing LLMs that are slightly flawed but still highly functional, and have parameter sizes that can be accommodated on a single GPU. Additionally, I will be evaluating the larger 3B OpenELM in the near future, so stay tuned for the article.

April 24, 2024

Is Phi-3-mini really better than Llama 3? – Testing the Limits of Small LLMs in Real-World Scenarios

Microsoft released Phi-3-mini, a 3.8 billion parameter model, today without much fanfare. What is really surprising is that according to the paper, despite being a smaller model, it outperforms the recently released Llama 3 8B model. Note that it has less than 50% of the parameters and can potentially run on your hand held device with quantisation and other neat tricks and still outperform Llama 3 8B. Seems like Llama 3’s run of being the best small model didn’t really last long.

So I decided to test this out myself, where I asked both Llama 3 and Phi-3-mini 3 different questions to make a qualitative evaluation on whether it’s really better than Llama 3 or is it the case that Phi-3 has been overfitted to perform well on the leaderboards.

I’m using Phi-3-mini instruct with 4k context length and Llama 3 8B Instruct.

Here is the link to the collab notebook if you want to play around with it.

The Needle in the Haystack

I presented a prompt consisting of approximately 2000 tokens, in which I inserted the sentence "Paneer Kadhai is the dish of the day" at a random location as a reference point. At the end of the prompt, I posed the question “What is the dish of the day?” as a means of assessing the model’s ability to recall information within a large context. As anticipated, Llama 3 8B gets the correct answer, however Phi-3-mini fails to return the answer. It returns some text from the prompt, but fails to return the answer. So this round goes to Llama 3.

Pattern Recognition

Now, this is a challenging problem. The models have to complete this sequence – 1,8,9,64,25,216,_.

As you can observe, the sequence is that odd place numbers follow the function x², and even place numbers follow the function x³ . Knowing this, the answer is fairly obvious that the missing number is 7*7 = 49.

Here Llama 3 was not able to find the answer, it inferred correctly that the response has something to do with squaring the numbers, but was unable to arrive at the correct response.

Fill in the blank letter '_' in  the pattern 1,8,9,64,25,216,_.
The pattern is obtained by squaring the numbers. So, the next number in the pattern is 1,8,9,64,25,216, 441.
Final Answer: The final answer is 441. I hope it is correct.

Phi-3-mini was equal in performance to Llama 3 at this task, which I thought was very impressive given its small size. In its response, it tried to reason that the sequence might contain cubing the numbers, though it didn’t arrive at the correct response. Phi-3-mini response –

#### Response

The pattern appears to be based on the sequence of cubes of natural numbers, but with alternating signs and skipping one number each time. The sequence is:

1^3 = 1
(-2)^3 = -8
3^3 = 27 (but given as 9, so it seems there's a pattern of subtracting 18)
(-4)^3 = -64
5^3 = 125 (but given as 64, so it seems there's a pattern of subtracting 61)
7^3 = 343 (but given as 25, so it seems there's a pattern of subtracting 318)

Following this pattern, the next number should be the cube of 8, but with the same subtraction pattern. Since the last operation was subtracting 318, the next operation should be subtracting 336 (318 + 18).

8^3 = 512
512 - 336 = 176

So the missing number is 17

This round is a tie.

SQL Coding

Finally, I asked both models to provide me with a coding solution to this leetcode problem. Now there are many solutions to this problem, but I was interested in seeing how both the LLMs approach this problem.

Both Llama 3 and Phi 3 didn’t answer the problem correctly, but Llama 3 gave me SQL to test, which is better than returning no code.

Phi-3 took an interesting approach. Rather than providing an SQL solution, it generated more examples—erroneous examples and due to the max tokens limitations, the answer was truncated.

Example 2:

Input: 
Employee table:
+-----+-------+------------+-----------+
| id  | name  | department | managerId |
+-----+-------+------------+-----------+
| 101 | John  | A          | null      |
| 102 | Dan   | A          | 101       |
| 103 | James | A          | 101       |
| 104 | Amy   | A          | 101       |
| 105 | Anne  | A          | 101       |
| 106 | Ron   | B          | 101       |
| 107 | Tom   | B          | 106       |
| 108 | Bob   | B          | 106       |
| 109 | Tim   | B          | 106       |
| 110 | Sam   | B          | 106       |
+-----+-------+------------+-----------+
Output: 
+------+
| name |
+------+
| John |
| Ron  |
+------+


Example 3:

Input: 
Employee table:
+-----+-------+------------+-----------+
| id  | name  | department | managerId |
+-----+-------+------------+-----------+
| 101 | John  | A          | null      |
| 102 | Dan   | A          | 101       |
| 103 | James | A          | 101       |
| 104 | Amy   | A          | 101       |
| 105 | Anne  | A          | 101       |
| 106 | Ron   | B          | 101       |
| 107 | Tom   | B          | 106       |
| 108 | Bob   | B          | 106       |
| 109 | Tim   | B          | 106       |
| 110 | Sam   | B          | 106       |
| 111 | Mike  | B          | 107       |
| 112 | Jack  | B          | 107       |
| 113 | Jill  | B          | 107       |
+-----+-------+------------+-----------+
Output: 
+------+
| name |
+------+
| John |
| Ron  |
| Tom  |
+------+


Example 4:

Input: 
Employee table:
+-----+-------+------------+-----------+
| id  | name  | department | managerId |
+-----+-------+------------+-----------+
| 101 | John  | A          | null      |
| 102 | Dan   | A          | 101       |
| 103 | James | A          | 101       |
| 104 | Amy   | A          | 101       |
| 105 | Anne  | A          | 101       |
| 106 | Ron   | B          | 101       |
| 107 | Tom   | B          | 106       |
| 108 | Bob   | B          | 106       |
| 109 | Tim   | B          | 106       |
| 110 | Sam   | B          | 106       |
| 111 | Mike  | B          | 107       |
| 112 | Jack  | B          | 107       |
| 113 | Jill  | B          | 107       |
| 114 | Mark  | B          | 108       |
| 115 | John  | B          | 108       |
+-----+-------+------------+-----------+
Output: 
+------+
| name |
+------+
| John |
| Ron  |
| Tom  |
| Bob  |
+------+


Example 5:

Input: 
Employee table:
+-----+-------+------------+-----------+
| id  | name  | department | managerId |
+-----+

Conclusion

Question	Llama 3 8B Instruct	Phi-3-mini 4K
Needle in Haystack	X
Pattern Recognition	X	X
SQL Code	X

While 3 questions are not enough to determine the strength of these small but still massive models, it still gave us some indication of the model’s strength, and to be honest, I think Phi-3-mini is overfitted to the benchmark. It may have faster inferencing and can fit on your device, but if you want better results, I’ll stick to using either Mistral or Llama 3B.

April 23, 2024

Exploring Data Distribution Differences in Machine Learning: An Adversarial Approach
First, a shout-out to Santiago, whose tweet inspired this post.

In the realm of machine learning, ensuring that models perform well not only on training data but also on unseen test data is crucial. A common challenge that arises is the difference in data distribution between training and testing datasets, known as dataset shift. This discrepancy can significantly degrade the performance of a model when deployed in real-world scenarios. To tackle this issue, researchers and practitioners have developed various methods to detect and quantify differences in data distribution. One innovative approach is the adversarial method, which leverages concepts from adversarial training to assess and address these differences.

Understanding Dataset Shift

Before diving into the adversarial methods, it is essential to understand what dataset shift entails. Dataset shift occurs when the joint distribution of inputs and outputs differs between the training and testing phases. This shift can be categorised into several types, including covariate shift, prior probability shift, and concept shift, each affecting the model in different ways.
- Covariate Shift: The distribution of input features changes between the training and testing datasets.
- Prior Probability Shift: The distribution of the output variable changes.
- Concept Shift: The relationship between the input features and the output variable changes.
Detecting and correcting for these shifts is crucial for developing robust machine learning models.

Adversarial Methods for Detecting Dataset Shift

Adversarial methods for dataset shift detection are inspired by adversarial training in neural networks, where models are trained to be robust against intentionally crafted malicious input. Similarly, in dataset shift detection, these methods involve creating a scenario where a model tries to distinguish between training and testing data based on their data distributions.

The way to do this is –
1. Combine your train and test data.
2. Create a new column, where you label training data as 1 and test data as 0.
3. Train a classifier on this using your new column as the target.
If the data in both train and test comes from the same distribution, the AUC will be close to 0.5, but if they are from different distributions, then the model will learn to differentiate the data points and the AUC will be close to 1.

Example

In this example, we will have training data as height and weight in metres and kilograms, and in the test data, we will have the same data but in centimetres and grams. Then if we train a simple logistic regression to learn on the dummy target, which is 1 on the training set and 0 on test data, given that we are not scaling the variables, the model should have an AUC close to 1.
```
#Loading required libraries
import numpy as np 
import pandas as pd
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from matplotlib import pyplot as plt
```
Then we define our features for train and test
```
# Set random seed for reproducibility
np.random.seed(42)

# Generate synthetic data
# Training data (height in meters, weight in kilograms)
train_height = np.random.normal(1.75, 0.1, 1000)  # Average height 1.75 meters
train_weight = np.random.normal(70, 10, 1000)    # Average weight 70 kg

# Test data (height in centimeters, weight in grams)
test_height = train_height * 100  # Convert meters to centimeters
test_weight = train_weight * 1000  # Convert kilograms to grams
```
Once we’ve our features defined, all we need to do is create a training dataset, train our classifier and check the AUC score.
```
# Combine data into feature matrices
X_train = np.column_stack((train_height, train_weight))
X_test = np.column_stack((test_height, test_weight))

# Create labels: 1 for training data, 0 for test data
y_train = np.ones(X_train.shape[0])
y_test = np.zeros(X_test.shape[0])

# Combine into a single dataset
X = np.vstack((X_train, X_test))
y = np.concatenate((y_train, y_test))

# Train logistic regression model
model = LogisticRegression()
model.fit(X, y)

# Predict probabilities for ROC AUC calculation
y_pred_proba = model.predict_proba(X)[:, 1]

# Calculate AUC
auc = roc_auc_score(y, y_pred_proba)
print(f"The AUC is: {auc:.2f}")
```
The AUC here comes out to be 1.0 as expected. Since the train and test data comes from different distributions, the model was easily able to identify the difference in the distribution between train and test.

Using this approach you can also easily test whether the train and test data come from the same distribution.
April 20, 2024

Is Llama 3 Really Better Than Mistral?

With the recent launch of the much anticipated Llama 3, I decided to use both Mistral, which is one of the best small (7B) language models out there, and Llama 3, which according to its benchmark scores claims to outperform Mistral. But is it really better when it comes to using it as the LLM in your RAG applications? To test this, I put the same questions to both Mistral and Llama 3 and the results will surprise you.

Link to Collab

I created a RAG application using Ollama. In case you want to know how you can do it yourself, you can check out this post. I used the Elden Ring Wikipedia article as the document for contextual retrieval. I was using conversation buffer memory, which just passes the entire conversational history as context back to the language model. Furthermore, I asked the same question to both LLMs, and at the end we also asked the same question to the current king of LLMs, GPT-4. The question was –

"How many awards did Elden Ring Win, and did it win Game of the year award ?"

The entire prompt with the context was –

Be precise in your response. Given the context - Elden Ring winning Game of the Year
at the 23rd Game Developers Choice
AwardsSome reviewers criticized a number of the game's menu and accessibility systems.[84][85] Reviewers
complained about the poor performance of the Window s version; framerate issues were commonly
mentioned.[81][86] Reviewers noted the story of Elden Ring lacks Martin's writing style. Kyle Orland of Ars
Technica said the game's storytelling is "characteristically sparse and cryptic", and differs from the
expectations of Martin's fans.[76] Chris Carter of Destructoid called the story "low key" but said it is better-
told than those of previous FromSoftware games.[80] Aoife Wilson of Eurogam er said George R. R.
Martin's heavy inclusion in the marketing was "baffling" when his contributions to the overall narrative
were unclear.[72] Mitchell Saltzman did not mind the lack of Martin's style, saying the side-stories rather
than any gr and, ove rarching pl ot kept him "enthralled".[70]

120. Mejia, Ozzie (January 26, 2023). "Elden Ring & Stray lead Game Developers Choice
Awards 2023 nominees" (https://www.shacknews.com/article/133863/gdc-2023-award-nomi
nees). Shacknews. Archived (https://web.archive.org/web/20230127040625/https://www.sha
cknews.com/article/133863/gdc-2023-award-nominees) from the original on January 27,
2023. Retrieved January 27, 2023.
121. Beth Elderkin (March 22, 2023). "'Elden Ring' Wins Game Of The Year At The 2023 Game
Developers Choice Awards" (https://gdconf.com/news/elden-ring-wins-game-year-2023-gam
e-developers-choice-awards). Game Developers Choice Conference. Archived (https://web.
archive.org/web/20230323091858/https://gdconf.com/news/elden-ring-wins-game-year-2023
-game-developers-choice-awards) from the original on March 23, 2023. Retrieved March 23,
2023.
122. "gamescom award 2021: These were the best games of the year" (https://www.gamescom.gl
obal/en/gamescom/gamescom-award/gamescom-award-review-2021). Gamescom.

Tin Pan Alley Award for
Best Music in a GameNominated
The Steam
AwardsJanuary 3, 2023Game of the Year Won
[132]
Best Game You Suck At Won
The Streamer
AwardsMarch 11, 2023Stream Game of the
YearWon[133]
1. Sawyer, Will; Franey, Joel (April 8, 2022). "Where Elden Ring takes place and the story
explained" (https://www.gamesradar.com/elden-ring-where-does-it-take-place-setting-story-l
ore/). gamesradar. Archived (https://web.archive.org/web/20220402212714/https://www.gam
esradar.com/elden-ring-where-does-it-take-place-setting-story-lore/) from the original on April
2, 2022. Retrieved July 26, 2022.
2. Knapp, Mark (June 16, 2021). "Elden Ring: Release Date, Gameplay, and What We Know
So Far" (https://www.ign.com/articles/elden-ring-release-date-news-gameplay-trailer-story).
IGN. Ziff Davis. Archived (https://web.archive.org/web/20220303124310/https://www.ign.co
m/articles/elden-ring-release-date-news-gameplay-trailer-story) from the original on March 3,
2022. Retrieved July 6, 2021.

Elden Ring sold 13.4 million copies worldwide by the end of March 2022,[87] and 20 million by February
2023.[88] It was the best-selling game in several regions between February and March 2022,[89][90] and is
the fastest-selling Bandai Namco game of all time.[91][92] It was the second-bestselling game of 2022 in the
US after Call of Duty: Modern Warfare II,[93] the third-bestselling in Europe,[94] and the tenth-bestselling in
Japan at retail.[95]
Elden Ring won many awards and accolades. It was listed as the
2022' s Game of the Year by several publications, including Ars
Technica,[96] Destructoid,[97] EGM,[98] Eurogam er,[99] Game
Informer,[100] GamesRadar +,[101] GameSpot,[102] IGN,[103] PC
Gamer,[104] and Polygon.[105] It is among the most awarded games
in history with 324 G ame of the Year awards.[106][107]S a l e s
A w a r d s,  Answer the following question - How many awards did Elden Ring Win, and did it win Game of the year award ?.

Mistral

1. "Elden Ring" (https://www.metacritic.com/game/pc/elden-ring). Metacritic. Archived (https://web.
archive.org/web/20220402212714/https://www.metacritic.com/game/pc/elden-ring) from the
original on April 2, 2022. Retrieved April 2, 2022.
2. "Elden Ring" (https://www.met

Llama 3 – 8B

Elden Ring winning Game of the Year
at the 23rd Game Developers Choice
AwardsSome reviewers criticized a number of the game's menu and accessibility systems.[84][85] Reviewers
complained about the poor performance of the Window s version; framerate issues were commonly
mentioned.[81][86] Reviewers noted the story of Elden Ring lacks Martin's writing style. Kyle Orland of Ars
Technica said the game's storytelling is "characteristically sparse and cryptic", and differs from the
expectations of Martin's fans.[76] Chris Carter

We can clearly see that Llama 3 – 8B got one part right, however, from the context it couldn’t find that it won 324 awards in total.

GPT-4

'Elden Ring won 324 Game of the Year awards. Yes, it did win the Game of the Year award at the 23rd Game Developers Choice Awards.'

GPT-4 is still far ahead of smaller LLMs, but Llama 3 8B has improved compared to Mistral.

April 19, 2024

Build Fully Local RAG Application with LLaMA 3: A Step-by-Step Guide

Meta just launched Llama 3 and its the best open source LLM you can use. So why not build a RAG Application using it. You can use the model for text-generation using either HuggingFace or Ollama, we will be using Ollama to create a RAG application which will run locally.

In this tutorial, we will build a Retrieval Augmented Generation(RAG) Application using Ollama and Langchain. For the vector store, we will be using Chroma, but you are free to use any vector store of your choice.

In case you just want the collab notebook, it’s available here.

There are 4 key steps to building your RAG application –

Load your documents
Add them to the vector store using the embedding function of your choice.
Define your prompt template.
Deinfe your Retrieval Chatbot using the LLM of your choice.

First we load the required libraries.

# Loading required libraries
import os
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.memory import ConversationSummaryMemory
from langchain_openai import OpenAIEmbeddings
from langchain.prompts import PromptTemplate
from langchain.llms import Ollama

Then comes step 1 which is to load our documents. Here I’ll be using Elden Ring Wiki PDF, you can just visit the Wikipedia page and download it as a PDF file.

data_path = "./data/Elden_Ring.pdf"
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2000,
    chunk_overlap=30,
    length_function=len,)
documents = PyPDFLoader(data_path).load_and_split(text_splitter=text_splitter)

In case you want to learn in detail about ChromaDB, you can visit our detailed guide to using ChromaDB. The next step is to use an embedding function that will convert our text into embeddings. I prefer using OpenAI embeddings, but you can use any embedding function. Using this embedding function we will add our documents to the Chroma vector database.

embedding_func = OpenAIEmbeddings(api_key=os.environ.get("OPENAI_API_KEY"))
vectordb = Chroma.from_documents(documents, embedding=embedding_func)

Moving on, we have to define a prompt template. I’ll be using the mistral model, so its a very basic prompt template that mistral provides.

template = """<s>[INST] Given the context - {context} </s>[INST] [INST] Answer the following question - {question}[/INST]"""
pt = PromptTemplate(
            template=template, input_variables=["context", "question"]
        )

All that is left to do is to define our memory and Retrieval Chatbot using Ollama as the LLM. To use Llama 3 as the LLM, all you have to do is define “llama3” as the model name.

rag = RetrievalQA.from_chain_type(
            llm=Ollama(model="mistral"),
            retriever=vectordb.as_retriever(),
            memory=ConversationSummaryMemory(llm = Ollama(model="mistral")),
            chain_type_kwargs={"prompt": pt, "verbose": True},
        )
rag.invoke("What is Elden Ring ?")
>>> {'query': 'What is Elden Ring ?',
 'history': '',
 'result': ' Elden Ring is a 2022 action role-playing game developed by FromSoftware. It was published for PlayStation 4, PlayStation 5, Windows, Xbox One, and Xbox Series X/S. In the game, players control a customizable character on a quest to repair the Elden Ring and become the new Elden Lord. The game is set in an open world, presented through a third-person perspective, and includes several types of weapons and magic spells. Players can traverse the six main areas using their steed Torrent and discover linear hidden dungeons and checkpoints that enable fast travel and attribute improvements. Elden Ring features online multiplayer mode for cooperative play or player-versus-player combat. The game was developed with inspirations from Dark Souls series, and contributions from George R.R. Martin on the narrative and Tsukasa Saitoh, Shoi Miyazawa, Tai Tomisawa, Yuka Kitamura, and Yoshimi Kudo for the original soundtrack. Elden Ring received critical acclaim for its open world, gameplay systems, and setting, with some criticism for technical performance. It sold over 20 million copies and a downloadable content expansion, Shadow of the Erdtree, is planned to be released in June 2024.'}

In sum, building a Retrieval Augmented Generation (RAG) application using the newly released LLaMA 3 model, Ollama, and Langchain enables robust local solutions for natural language queries. This tutorial walked you through the comprehensive steps of loading documents, embedding them into a vector store like Chroma, and setting up a dynamic RAG application that retrieves and generates responses efficiently. By harnessing the power of the newly released LLaMA 3 by Meta as the LLM and Langchain to create the chatbot, you can create intelligent systems that significantly enhance user interaction and information retrieval. The capabilities demonstrated here illustrate just a fraction of the potential applications. Let me know in the comments if you want me to cover something else.

April 19, 2024

Mastering Time: Unlock Hyper-Parameter Tuning with Time Series Cross-Validation
We all know how to do hyper-parameter tuning using scikit-learn, but I guess you might be struggling with how to tune your hyper-parameters using time-series cross-validation. First, let’s understand what time-series cross-validation is in the first place.

Time series cross-validation is a technique used to evaluate the performance of predictive models on time-ordered data. Unlike traditional cross-validation methods, which randomly split the dataset into training and testing sets, time series cross-validation maintains the chronological order of observations. This approach is crucial for time series data, where the relationship between past and future data points is essential for accurate predictions. In time series cross-validation, the dataset is split into a series of training and testing sets over time. For example, in a simple walk-forward validation, the model might be trained on the first year of data and tested on the following month, then trained on the first year plus one month, and tested on the next month, and so on. This method allows for the evaluation of the model’s performance over different time intervals, ensuring that the model can adapt to changes in the data over time.

We will be utilising TimeSeriesSplit from scikit-learn to get these splits on our data.

Suppose we have our train data and test data ready with all the features, and we’ve a timestamp column also in it. So the first step is to set this column as the index and sort the dataframe.
```
# Supposing X is our dataframe and timestamp_ is the column name which has the time related information.
import pandas as pd
X.set_index(keys='timestamp_', drop=True, inplace = True)
X.sort_index(inplace=True)
y = X[<target col>]
X.drop([<target col>], axis = 1, inplace = True)
```
Once you’ve the DataFrame sorted, now you need to create your hyper-parameter grid. For this also, we will be using scikit-learn to help us. We will also need to create the time series splits, again using scikit-learn to create those for us. You can write this to run in parallel, but since we are using a demo example, we will be using for loops. But first, we will write a training function. Assuming our task is a classification one and we’re using catboost.
```
from catboost import CatBoostClassifier
import pandas as pd
import numpy as np
from sklearn.metrics import roc_auc_score

def train(param: dict, X: pd.DataFrame, y: pd.Series, train_index: np.array, test_index: np.array) -> float:
    X_train, X_val = X.iloc[train_index], X.iloc[test_index]
    y_train, y_val = y.iloc[train_index], y.iloc[test_index]
    
    model = CatBoostClassifier(max_depth=param['max_depth'],
                               subsample=param['subsample'],
                               verbose=0)  # Set verbose to 0 for silent training
    
    model.fit(X_train, y_train,
              eval_set=(X_val, y_val))
    
    # Predict probabilities for the positive class
    y_pred_proba = model.predict_proba(X_val)[:, 1]
    
    # Calculate AUC score
    score = roc_auc_score(y_val, y_pred_proba)
    
    return score
```
Here the function takes the parameter dictionary, the feature matrix, the label and the index which we will get after using TimeSeriesSplit. It then fits a model. I have used AUC as an example metric, but you’re free to use any metric. After this, all we need to do is run the training over all possible combinations of parameters and keep track of the best score and best parameters.
```
from sklearn.model_selection import TimeSeriesSplit, ParameterGrid

params = {'max_depth' : [6,7,8],
          'subsample' : [0.8,1] }

# Initialising the best_score and best_params
best_score = -999
best_params = None

# Looping over the parameters
for i, param in enumerate(ParameterGrid(params)):
     scores = [train(param=param, train_index=train_index, test_index=test_index, X=X, y=y) for train_index, test_index in tscv.split(X)] 
     cv_score = np.mean(scores)
     if cv_score > best_score:
        best_score = cv_score
        best_params = param
 
```
In the above block, we define a grid, and then using the ParameterGrid we create a generator which yields a parameter dict on each run of the for loop. In the loop, we calculate the score on each split, which we get from the TimeSeriesSplit, it creates indices to use for the splits, but it has to be fed an already sorted data on time, hence we did this step in the beginning.

Once we have the score for each split, we compare the average to the existing best_score, if it’s greater then we update both the best_score and best_params. Once all possible combinations are done, we now have a tuned model hyper-parameters using time series cross-validation. Once you’ve the final hyper-parameters, all that’s left is to train your final model.
```
# Assuming best_params contains the best hyper-parameter values found
# from the tuning process

# Initialize the model with the best parameters
final_model = CatBoostClassifier(max_depth=best_params['max_depth'],
                                 subsample=best_params['subsample'])

# Fit the model on the entire dataset
final_model.fit(X, y, eval_set=(X_val, y_val))

# Now, the final_model is trained with the best hyper-parameters on the full dataset
# You can proceed to make predictions or further evaluate the model as needed
```
April 14, 2024

Articles

What Is Q-Learning ?

The Frozen Lake Environment

Conclusion

How to Run OpenELM?

Math Question

Pattern Question

SQL Question

Logical Puzzle 1

Logical Puzzle 2

Conclusion

The Needle in the Haystack

Pattern Recognition

SQL Coding

Conclusion

Understanding Dataset Shift

Adversarial Methods for Detecting Dataset Shift

Example

Mistral

Llama 3 – 8B

GPT-4