Category: LLM

  • I Tested GPT-5 API So You Don’t Have To

    I Tested GPT-5 API So You Don’t Have To

    DISCLAIMER: The article has been written by a human, so you will find grammatical errors and simple language.

    So OpenAI just launched GPT-5 and like you I’m also hyped, it’s faster, better and cheaper than 4o, 4.1 and o4 as their demo videos state and I’m fully on-board the hype train.

    It’s cheaper than Claude Opus and Sonnet also on API pricing, not just cheaper but 10x cheaper than Opus and ~60% cheaper in input tokens when compared to Sonnet.

    Pricing as on 2025-08-07
    Pricing as on 2025-08-07
    Pricing as on 2025-08-07
    Pricing as on 2025-08-07

    But the real question is, is it better than Sonnet or Opus. I tested both models out on specific tasks and compared their performance, both on speed and accuracy. These tasks are not your typical coding challenges, as we know that these models are much better at coding than average humans, so we test these with some bizarre requirements. We also test the API endpoints, as the ChatGPT interface is not a single LLM call but rather an Agent. So we get to understand, what we can build with these models.

    TASK – Draw a Squirrel Riding a Cycle

    This is not your typical image generation, the text endpoint has to create this image, using the SVG format, the thing about SVG format is that its based on XML, so it can be output as a text.

    I’m using Pydantic-AI to invoke the APIs, in case you want me to cover a detailed guide on using Pydantic-AI to build agents, write in the comments below.

    Prompt – “Draw a squirrel riding a cycle in SVG format, just return the xml.”

    GPT-5

    So GPT-5 consumed around 5k tokens and took 111.3698 seconds. The image it generated looks good. It looks somewhat like a squirrel and the cycle was I guess easy for it to draw.

    There are some issues like spokes are missing and the attempt at shadow is off and the handles are missing, but it’s pretty impressive.

    Usage(requests=1, request_tokens=21, response_tokens=4777, total_tokens=4798, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 2368, 'rejected_prediction_tokens': 0, 'cached_tokens': 0})
    <svg xmlns="http://www.w3.org/2000/svg" width="800" height="600" viewBox="0 0 800 600">\n  <defs>\n    <line id="spoke" x1="0" y1="0" x2="0" y2="-65" stroke="#cfcfcf" stroke-width="2" stroke-linecap="round"/>\n  </defs>\n\n  <!-- Ground shadow -->\n  <ellipse cx="260" cy="455" rx="95" ry="14" fill="rgba(0,0,0,0.12)"/>\n  <ellipse cx="540" cy="455" rx="95" ry="14" fill="rgba(0,0,0,0.12)"/>\n\n  <!-- Bicycle -->\n  <g id="bicycle">\n    <!-- Rear wheel -->\n    <g transform="translate(260 420)">\n      <circle r="80" fill="none" stroke="#222" stroke-width="10"/>\n      <circle r="65" fill="none" stroke="#8f8f8f" stroke-width="3"/>\n      <circle r="6" fill="#777"/>\n      <use href="#spoke" transform="rotate(0)"/>\n      <use href="#spoke" transform="rotate(15)"/>\n      <use href="#spoke" transform="rotate(30)"/>\n      <use href="#spoke" transform="rotate(45)"/>\n      <use href="#spoke" transform="rotate(60)"/>\n      <use href="#spoke" transform="rotate(75)"/>\n      <use href="#spoke" transform="rotate(90)"/>\n      <use href="#spoke" transform="rotate(105)"/>\n      <use href="#spoke" transform="rotate(120)"/>\n      <use href="#spoke" transform="rotate(135)"/>\n      <use href="#spoke" transform="rotate(150)"/>\n      <use href="#spoke" transform="rotate(165)"/>\n    </g>\n\n    <!-- Front wheel -->\n    <g transform="translate(540 420)">\n      <circle r="80" fill="none" stroke="#222" stroke-width="10"/>\n      <circle r="65" fill="none" stroke="#8f8f8f" stroke-width="3"/>\n      <circle r="6" fill="#777"/>\n      <use href="#spoke" transform="rotate(0)"/>\n      <use href="#spoke" transform="rotate(15)"/>\n      <use href="#spoke" transform="rotate(30)"/>\n      <use href="#spoke" transform="rotate(45)"/>\n      <use href="#spoke" transform="rotate(60)"/>\n      <use href="#spoke" transform="rotate(75)"/>\n      <use href="#spoke" transform="rotate(90)"/>\n      <use href="#spoke" transform="rotate(105)"/>\n      <use href="#spoke" transform="rotate(120)"/>\n      <use href="#spoke" transform="rotate(135)"/>\n      <use href="#spoke" transform="rotate(150)"/>\n      <use href="#spoke" transform="rotate(165)"/>\n    </g>\n\n    <!-- Frame geometry points:\n         Rear hub R(260,420), Bottom bracket B(365,420),\n         Seat S(385,350), Head H(520,355), Front hub F(540,420)\n    -->\n    <!-- Chainstay -->\n    <line x1="260" y1="420" x2="365" y2="420" stroke="#2c6b8a" stroke-width="8" stroke-linecap="round"/>\n    <!-- Seat stay -->\n    <line x1="260" y1="420" x2="385" y2="350" stroke="#2c6b8a" stroke-width="6" stroke-linecap="round"/>\n    <!-- Seat tube -->\n    <line x1="365" y1="420" x2="385" y2="350" stroke="#2c6b8a" stroke-width="8" stroke-linecap="round"/>\n    <!-- Top tube -->\n    <line x1="385" y1="350" x2="520" y2="355" stroke="#2c6b8a" stroke-width="7" stroke-linecap="round"/>\n    <!-- Down tube -->\n    <line x1="365" y1="420" x2="520" y2="355" stroke="#2c6b8a" stroke-width="8" stroke-linecap="round"/>\n    <!-- Fork -->\n    <line x1="520" y1="355" x2="540" y2="420" stroke="#2c6b8a" stroke-width="7" stroke-linecap="round"/>\n\n    <!-- Seat -->\n    <rect x="360" y="335" width="55" height="14" rx="7" ry="7" fill="#333"/>\n    <rect x="372" y="345" width="8" height="15" fill="#333"/>\n\n    <!-- Handlebars -->\n    <line x1="520" y1="355" x2="530" y2="350" stroke="#333" stroke-width="6" stroke-linecap="round"/>\n    <path d="M 530 350 C 546 344, 565 344, 580 352" fill="none" stroke="#333" stroke-width="7" stroke-linecap="round"/>\n    <line x1="579" y1="351" x2="586" y2="341" stroke="#555" stroke-width="7" stroke-linecap="round"/>\n\n    <!-- Crankset and chain -->\n    <circle cx="365" cy="420" r="28" fill="none" stroke="#555" stroke-width="6"/>\n    <circle cx="365" cy="420" r="14" fill="#777"/>\n    <!-- Crank arms -->\n    <line x1="365" y1="420" x2="398" y2="401" stroke="#444" stroke-width="6" stroke-linecap="round"/>\n    <line x1="365" y1="420" x2="332" y2="439" stroke="#444" stroke-width="6" stroke-linecap="round"/>\n    <!-- Pedals -->\n    <rect x="394" y="395" width="16" height="8" rx="3" ry="3" fill="#666" transform="rotate(-30 402 399)"/>\n    <rect x="316" y="435" width="16" height="8" rx="3" ry="3" fill="#666" transform="rotate(150 324 439)"/>\n    <!-- Chain to rear cog -->\n    <path d="M 393 420 C 412 420, 240 421, 240 421\n             M 393 412 C 412 412, 240 413, 240 413"\n          fill="none" stroke="#5e5e5e" stroke-width="3" stroke-linecap="round"/>\n    <!-- Rear cog -->\n    <circle cx="260" cy="420" r="12" fill="none" stroke="#666" stroke-width="4"/>\n  </g>\n\n  <!-- Squirrel -->\n  <g id="squirrel" stroke-linejoin="round">\n    <!-- Tail -->\n    <path d="M 380 360\n             C 330 340, 320 290, 355 265\n             C 395 235, 460 280, 445 320\n             C 437 341, 408 346, 396 338\n             C 408 358, 395 372, 380 360 Z"\n          fill="#C97E3A" stroke="#8A4B20" stroke-width="4"/>\n    <path d="M 392 332\n             C 410 343, 433 333, 437 316"\n          fill="none" stroke="#B8682F" stroke-width="3" stroke-linecap="round"/>\n\n    <!-- Body -->\n    <ellipse cx="400" cy="365" rx="48" ry="36" fill="#C97E3A" stroke="#8A4B20" stroke-width="4"/>\n    <!-- Belly -->\n    <ellipse cx="395" cy="372" rx="26" ry="20" fill="#F5C78A" stroke="none" opacity="0.9"/>\n\n    <!-- Head -->\n    <circle cx="432" cy="330" r="24" fill="#C97E3A" stroke="#8A4B20" stroke-width="4"/>\n    <!-- Ear back -->\n    <path d="M 420 308 L 408 292 L 424 298 Z" fill="#B8682F" stroke="#8A4B20" stroke-width="3"/>\n    <!-- Ear front -->\n    <path d="M 442 307 L 455 292 L 448 311 Z" fill="#B8682F" stroke="#8A4B20" stroke-width="3"/>\n    <!-- Eye -->\n    <ellipse cx="438" cy="326" rx="7" ry="9" fill="#fff"/>\n    <circle cx="440" cy="328" r="4" fill="#222"/>\n    <circle cx="441.5" cy="326.5" r="1.5" fill="#fff" opacity="0.8"/>\n    <!-- Nose and mouth -->\n    <circle cx="454" cy="336" r="3" fill="#3c1f12"/>\n    <path d="M 450 340 Q 446 345, 440 342" fill="none" stroke="#3c1f12" stroke-width="2" stroke-linecap="round"/>\n\n    <!-- Arm to handlebar -->\n    <path d="M 410 355\n             Q 450 352, 520 355" fill="none" stroke="#8A4B20" stroke-width="8" stroke-linecap="round"/>\n    <!-- Paw on grip -->\n    <ellipse cx="582" cy="352" rx="10" ry="7" fill="#C97E3A" stroke="#8A4B20" stroke-width="4"/>\n\n    <!-- Rear leg to pedal -->\n    <path d="M 382 385\n             Q 392 398, 398 401" fill="none" stroke="#8A4B20" stroke-width="8" stroke-linecap="round"/>\n    <!-- Rear paw on pedal -->\n    <ellipse cx="402" cy="399" rx="9" ry="6" fill="#C97E3A" stroke="#8A4B20" stroke-width="3"/>\n\n    <!-- Front leg resting -->\n    <path d="M 386 380\n             Q 380 392, 370 402" fill="none" stroke="#8A4B20" stroke-width="7" stroke-linecap="round"/>\n\n    <!-- Cheek highlight -->\n    <ellipse cx="425" cy="340" rx="9" ry="6" fill="#F5C78A" opacity="0.7"/>\n  </g>\n</svg>
    

    Claude 4 Sonnet

    I’m too broke to use Opus 4.1, so we shall compare Sonnet 4

    <svg width="400" height="300" xmlns="http://www.w3.org/2000/svg">\n  <!-- Bicycle wheels -->\n  <circle cx="100" cy="200" r="40" fill="none" stroke="#333" stroke-width="3"/>\n  <circle cx="250" cy="200" r="40" fill="none" stroke="#333" stroke-width="3"/>\n  \n  <!-- Bicycle frame -->\n  <line x1="100" y1="200" x2="175" y2="160" stroke="#666" stroke-width="4"/>\n  <line x1="175" y1="160" x2="230" y2="160" stroke="#666" stroke-width="4"/>\n  <line x1="230" y1="160" x2="250" y2="200" stroke="#666" stroke-width="4"/>\n  <line x1="175" y1="160" x2="175" y2="190" stroke="#666" stroke-width="4"/>\n  <line x1="175" y1="190" x2="250" y2="200" stroke="#666" stroke-width="4"/>\n  \n  <!-- Pedals -->\n  <circle cx="175" cy="190" r="5" fill="#333"/>\n  <line x1="170" y1="185" x2="180" y2="195" stroke="#333" stroke-width="2"/>\n  \n  <!-- Handlebars -->\n  <line x1="230" y1="160" x2="235" y2="150" stroke="#666" stroke-width="3"/>\n  <line x1="225" y1="150" x2="245" y2="150" stroke="#666" stroke-width="3"/>\n  \n  <!-- Squirrel body -->\n  <ellipse cx="190" cy="135" rx="25" ry="15" fill="#8B4513"/>\n  \n  <!-- Squirrel head -->\n  <circle cx="210" cy="120" r="18" fill="#8B4513"/>\n  \n  <!-- Squirrel ears -->\n  <ellipse cx="205" cy="105" rx="4" ry="8" fill="#8B4513"/>\n  <ellipse cx="215" cy="105" rx="4" ry="8" fill="#8B4513"/>\n  \n  <!-- Squirrel tail -->\n  <path d="M 165 140 Q 140 120 130 100 Q 125 95 135 90 Q 145 95 150 110 Q 160 130 165 140" fill="#A0522D" stroke="#8B4513"/>\n  \n  <!-- Squirrel arms -->\n  <line x1="200" y1="130" x2="240" y2="145" stroke="#8B4513" stroke-width="4"/>\n  <circle cx="240" cy="145" r="3" fill="#8B4513"/>\n  \n  <!-- Squirrel legs -->\n  <line x1="185" y1="145" x2="175" y2="175" stroke="#8B4513" stroke-width="4"/>\n  <line x1="195" y1="145" x2="175" y2="175" stroke="#8B4513" stroke-width="4"/>\n  \n  <!-- Squirrel face -->\n  <circle cx="206" cy="118" r="2" fill="#000"/>\n  <circle cx="214" cy="118" r="2" fill="#000"/>\n  <circle cx="210" cy="125" r="1" fill="#000"/>\n  \n  <!-- Wheel spokes -->\n  <line x1="100" y1="200" x2="120" y2="180" stroke="#666" stroke-width="1"/>\n  <line x1="100" y1="200" x2="80" y2="180" stroke="#666" stroke-width="1"/>\n  <line x1="100" y1="200" x2="120" y2="220" stroke="#666" stroke-width="1"/>\n  <line x1="100" y1="200" x2="80" y2="220" stroke="#666" stroke-width="1"/>\n  \n  <line x1="250" y1="200" x2="270" y2="180" stroke="#666" stroke-width="1"/>\n  <line x1="250" y1="200" x2="230" y2="180" stroke="#666" stroke-width="1"/>\n  <line x1="250" y1="200" x2="270" y2="220" stroke="#666" stroke-width="1"/>\n  <line x1="250" y1="200" x2="230" y2="220" stroke="#666" stroke-width="1"/>\n</svg>
    
    Usage(requests=1, request_tokens=25, response_tokens=1156, total_tokens=1181, details={'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 25, 'output_tokens': 1156})

    Sonnet took like 15 seconds and only 1.2k tokens and honestly, despite having lesser details than GPT-5, it looks more in line with the prompt than GPT-5. The squirrel clearly looks like a squirrel and the cycle is with handles. So I’ll say that this is a tie in my opinion.

    TASK – Draw a medium difficulty maze

    Prompt -“Draw a medium difficulty maze in SVG format, just return the xml.”

    GPT-5

    GPT-5 took 6.1k tokens and 70 seconds

    Usage(requests=1, request_tokens=20, response_tokens=6081, total_tokens=6101, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 5120, 'rejected_prediction_tokens': 0, 'cached_tokens': 0})

    I don’t know about you, but this is more like an easy maze than a difficult one.

    <svg xmlns="http://www.w3.org/2000/svg" width="600" height="600" viewBox="0 0 600 600">\n  <rect x="0" y="0" width="600" height="600" fill="#ffffff"/>\n  <g stroke="#000000" stroke-width="10" stroke-linecap="square" fill="none">\n    <!-- Outer border (with entrance on left and exit on right) -->\n    <line x1="20" y1="20" x2="580" y2="20"/>\n    <line x1="20" y1="20" x2="20" y2="60"/>\n    <line x1="20" y1="100" x2="20" y2="580"/>\n    <line x1="580" y1="20" x2="580" y2="540"/>\n    <line x1="20" y1="580" x2="580" y2="580"/>\n\n    <!-- Horizontal walls with alternating gaps to form a serpentine path -->\n    <!-- y = 100, gap [520,560] -->\n    <line x1="20" y1="100" x2="520" y2="100"/>\n    <line x1="560" y1="100" x2="580" y2="100"/>\n    <!-- y = 140, gap [40,80] -->\n    <line x1="20" y1="140" x2="40" y2="140"/>\n    <line x1="80" y1="140" x2="580" y2="140"/>\n    <!-- y = 180, gap [480,520] -->\n    <line x1="20" y1="180" x2="480" y2="180"/>\n    <line x1="520" y1="180" x2="580" y2="180"/>\n    <!-- y = 220, gap [80,120] -->\n    <line x1="20" y1="220" x2="80" y2="220"/>\n    <line x1="120" y1="220" x2="580" y2="220"/>\n    <!-- y = 260, gap [440,480] -->\n    <line x1="20" y1="260" x2="440" y2="260"/>\n    <line x1="480" y1="260" x2="580" y2="260"/>\n    <!-- y = 300, gap [120,160] -->\n    <line x1="20" y1="300" x2="120" y2="300"/>\n    <line x1="160" y1="300" x2="580" y2="300"/>\n    <!-- y = 340, gap [400,440] -->\n    <line x1="20" y1="340" x2="400" y2="340"/>\n    <line x1="440" y1="340" x2="580" y2="340"/>\n    <!-- y = 380, gap [160,200] -->\n    <line x1="20" y1="380" x2="160" y2="380"/>\n    <line x1="200" y1="380" x2="580" y2="380"/>\n    <!-- y = 420, gap [360,400] -->\n    <line x1="20" y1="420" x2="360" y2="420"/>\n    <line x1="400" y1="420" x2="580" y2="420"/>\n    <!-- y = 460, gap [200,240] -->\n    <line x1="20" y1="460" x2="200" y2="460"/>\n    <line x1="240" y1="460" x2="580" y2="460"/>\n    <!-- y = 500, gap [320,360] -->\n    <line x1="20" y1="500" x2="320" y2="500"/>\n    <line x1="360" y1="500" x2="580" y2="500"/>\n    <!-- y = 540, gap [240,280] -->\n    <line x1="20" y1="540" x2="240" y2="540"/>\n    <line x1="280" y1="540" x2="580" y2="540"/>\n  </g>\n</svg>
    

    Claude 4 Sonnet

    Sonnet took 11 seconds and 860 tokens

    Usage(requests=1, request_tokens=22, response_tokens=838, total_tokens=860, details={'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 22, 'output_tokens': 838})
    <svg width="400" height="400" xmlns="http://www.w3.org/2000/svg">\n  <rect width="400" height="400" fill="white" stroke="black" stroke-width="2"/>\n  \n  <!-- Outer walls -->\n  <rect x="0" y="0" width="400" height="20" fill="black"/>\n  <rect x="0" y="380" width="400" height="20" fill="black"/>\n  <rect x="0" y="0" width="20" height="400" fill="black"/>\n  <rect x="380" y="0" width="20" height="400" fill="black"/>\n  \n  <!-- Start opening -->\n  <rect x="0" y="20" width="20" height="20" fill="white"/>\n  \n  <!-- End opening -->\n  <rect x="380" y="360" width="20" height="20" fill="white"/>\n  \n  <!-- Internal walls -->\n  <rect x="60" y="20" width="20" height="100" fill="black"/>\n  <rect x="140" y="20" width="20" height="80" fill="black"/>\n  <rect x="220" y="20" width="20" height="120" fill="black"/>\n  <rect x="300" y="20" width="20" height="60" fill="black"/>\n  \n  <rect x="20" y="160" width="80" height="20" fill="black"/>\n  <rect x="160" y="140" width="100" height="20" fill="black"/>\n  <rect x="320" y="100" width="60" height="20" fill="black"/>\n  \n  <rect x="120" y="180" width="20" height="80" fill="black"/>\n  <rect x="280" y="160" width="20" height="100" fill="black"/>\n  <rect x="40" y="220" width="60" height="20" fill="black"/>\n  \n  <rect x="180" y="200" width="80" height="20" fill="black"/>\n  <rect x="320" y="180" width="20" height="60" fill="black"/>\n  \n  <rect x="20" y="280" width="60" height="20" fill="black"/>\n  <rect x="160" y="260" width="20" height="80" fill="black"/>\n  <rect x="200" y="280" width="60" height="20" fill="black"/>\n  \n  <rect x="100" y="320" width="40" height="20" fill="black"/>\n  <rect x="220" y="320" width="80" height="20" fill="black"/>\n  \n  <rect x="60" y="360" width="100" height="20" fill="black"/>\n  <rect x="200" y="360" width="60" height="20" fill="black"/>\n  \n  <!-- Additional maze complexity -->\n  <rect x="340" y="280" width="20" height="60" fill="black"/>\n  <rect x="260" y="240" width="40" height="20" fill="black"/>\n  <rect x="80" y="120" width="40" height="20" fill="black"/>\n  <rect x="360" y="140" width="20" height="40" fill="black"/>\n  <rect x="140" y="300" width="20" height="40" fill="black"/>\n</svg>
    

    The maze generated by Sonnet also is on the easier end of mazes.

    Though sonnet seems to be a lost faster and consumes far fewer tokens. From this test and what I’ve seen on the internet, for most of your use-cases, there shouldn’t be a significant jump in using GPT-5 over Sonnet or even Gemini 2.5 pro.

    Let me know if you want me to test GPT-5 vs Gemini 2.5 pro or any other model.

  • OpenELM – Apple Enters the Open Source LLM Race But Is it Any Good?

    OpenELM – Apple Enters the Open Source LLM Race But Is it Any Good?

    Not even a couple of days have passed since Microsoft’s release of Phi-3-mini which I tested in this blog post. Now Apple has joined the race of Open Source LLMs by release a class of LLMs under the name OpenELM. In this post I’ll guide how you can start using it, as the HuggingFace page does not describe the tokeniser to use. We will also compare it to Phi-3-mini 3.8B instruct. For the comparison, I’ll be using OpenELM 1.1B Instruct. I ran into some issues using the 3B instruct model.

    Link to collab notebook.

    How to Run OpenELM?

    I’m sure that by reading the model page, you might wonder that the tokenizer is not specified, so how can I start using OpenELM using HuggingFace, well the fact, and this is one of the most apple things, that they have described in the paper the tokenizer they are using. It can be found in this link. Its using a llama 2 tokenizer.

    So your setup will look something like this –

    from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
    import torch
    device = "cuda:0" # Specify device
    tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf", pad_token_id=0) # Load in the tokenizer
    # Load the model
    model = AutoModelForCausalLM.from_pretrained("apple/OpenELM-3B-Instruct",device_map=device,trust_remote_code= True)

    Once you’ve the mode loaded, you need to write the generation code. One to prepare the tokens and the next to generate the text. The code provided by HuggingFace also uses an assistant model, so we will use the same model as assistant.

    def prepare_prompt(prompt:str):
      tokens = tokenizer(prompt)
      tokenized_prompt = torch.tensor(
            tokens['input_ids'],
            device=device
        )
      return tokenized_prompt.unsqueeze(0)
    def generate(prompt:str, model:AutoModelForCausalLM, max_length:int = 128):
      tokenized_prompt = prepare_prompt(prompt)
      output_ids = model.generate(
            tokenized_prompt,
            max_length=max_length,
            pad_token_id=0,
            assistant_model = model)
      output_text = tokenizer.decode(
            output_ids[0].tolist(),
            skip_special_tokens=True
        )
      return output_text

    Now you’re set up for text generation using OpenELM.

    Math Question

    Find x and y from these equations. x-2y = 10 and 2x + y = 0. Be brief in your response.

    Phi-3-mini Response

    ### Response

    To solve the system of equations, we can use the substitution or elimination method. Here, we'll use the elimination method.


    Given equations:

    1) x - 2y = 10

    2) 2x + y = 0


    First, we'll manipulate the second equation to eliminate y:

    Multiply the second equation by 2 to get:

    4x + 2y = 0


    Now, add this new equation to the first equation:

    (x - 2y) + (4x + 2y) = 10 + 0

    5x = 10

    x = 10 / 5

    x = 2


    Now that we have x, we can substitute it back into either of the original equations to find y. We'll use the second equation:

    2x + y = 0

    2(2) + y = 0

    4 + y = 0

    y = -4


    So, the solution is x = 2 and y = -4.

    Phi-3-mini got the correct Answer.

    OpenELM Response –

    Find x and y from these equations. x-2y = 10 and 2x + y = 0. Be brief in your response.
    Solution:
    Find the following equations:
    x-2y = 10
    2x + y = 0
    Solve the given equations for the given points:
    Find:
    x
    y
    Solution:
    The given equations can be solved in the following form:
    x-2y = -10
    y = a + bx
    Solve the given equations for the given points:
    a = -10
    b = -1
    Solution:
    y = -1 + bx
    y = -2 + bx
    The given solutions are:
    y = -2 + bx and y = -1 + bx
    Now, find the points a and b:
    a = -1 + bx
    bx = -2 + bx
    Solution:
    bx = -2 - a + b
    bx = -2 - -1 + b
    bx = -4
    Find the points a and b:
    a = -4 and b = -4
    Solution:
    a = -4 - b
    a = 2
    The given points a and b are:
    a = 2 and b = 2

    Good attempt, given it’s a 1.1 B parameter model, but wrong answer.

    Pattern Question

    "Fill in the blank letter '_' in the pattern 1,8,9,64,25,216,_. Only return the answer"

    Phi-3-mini Response

    ### Response

    The pattern appears to be based on the sequence of cubes of natural numbers, but with alternating signs and skipping one number each time. The sequence is:

    1^3 = 1
    (-2)^3 = -8
    3^3 = 27 (but given as 9, so it seems there's a pattern of subtracting 18)
    (-4)^3 = -64
    5^3 = 125 (but given as 64, so it seems there's a pattern of subtracting 61)
    7^3 = 343 (but given as 25, so it seems there's a pattern of subtracting 318)

    Following this pattern, the next number should be the cube of 8, but with the same subtraction pattern. Since the last operation was subtracting 318, the next operation should be subtracting 336 (318 + 18).

    8^3 = 512
    512 - 336 = 176

    So the missing number is 176.

    It inferred again that there some cubing, but did not arrive at the right solution, which is 49.

    OpenELM 1.1B Response –

    OpenELM completely went off the track, and its response makes no sense.

    For example, given the pattern:

    1,8,9,64,25,216,_.

    To return the only match with '_':

    _(1,8,9,64,25,216,_)

    To return all matches with '_':

    _(1_,8_,9_,64_,25_,216_,_)

    To return all matches starting with '_':

    _(_,8_,9_,25_,216_,216_,_)

    To return all matches ending with '_':

    _(_,9_,216_,216_,216_,_)

    To return all matches containing '_':

    _(_,8_,9_,25_,216_,216_,_)_

    To return all matches starting with '_and_ ending with '_':

    _(_,8_,9_,25_,216_,216_ _,)_

    To return all matches starting with '_or_ ending with '_':

    _(_,8_,9_,25_,216_,216|_)_

    To return all matches starting with '_or_and_

    SQL Question

    Provide me an SQL query to find the oldest employee_id in every region in the table Employee.
    The table has 3 columns - 1. employee_id - This is the primary key
    2. joining_date - Date of joining of the employee
    3. region - region of employee.

    Phi-3-mini Response –

    - Response: To find the oldest employee_id in every region from the `Employee` table, you can use the following SQL query. This query groups the results by the `region` column and then for each region, it finds the oldest `employee_id` based on the earliest `joining_date`.

    ```sql
    SELECT
    region,
    employee_id
    FROM
    Employee
    WHERE
    (region, joining_date) IN (
    SELECT
    region,
    MIN(joining_date)
    FROM
    Employee
    GROUP BY
    region
    )
    ORDER BY
    region;
    ```

    This looks like a query that could work, although a more efficient solution could be given.

    OpenELM 1.1 B Response –

    ```sql
    SELECT employee_id
    FROM "Employee"
    WHERE 1 = (SELECT MAX(1)
    FROM "Employee"
    WHERE "Region" = 'REGION_' + region_name(region));

    region_name('Europe/London')
    ```

    Though it gives a straightforward SQL, the quality of its answer is way below that of Phi-3-mini. But that’s to be expected, given the difference in model size.

    Logical Puzzle 1

    A lot of humans get this wrong –

    A bat and ball cost 1.10 and the difference between a bat and a ball is is 1. How much does each cost? Be brief in your response.

    Phi-3-mini Response

    I am very impressed that it got this right, just like the math puzzle, so seems like its a good LLM for math problems.

    ### Response
    
    Let's denote the cost of the bat as B and the cost of the ball as C. We are given two pieces of information:
    
    1. The total cost of the bat and the ball is $1.10:
       B + C = 1.10
    
    2. The difference in cost between the bat and the ball is $1:
       B - C = 1
    
    We can solve these two equations simultaneously to find the values of B and C.
    
    Adding the two equations together, we get:
    (B + C) + (B - C) = 1.10 + 1
    2B = 2.10
    B = 2.10 / 2
    B = 1.05
    
    Now that we have the cost of the bat, we can find the cost of the ball by substituting B back into one of the original equations:
    
    B + C = 1.10
    1.05 + C = 1.10
    C = 1.10 - 1.05
    C = 0.05
    
    So, the bat costs $1.05 and the ball costs $0.05.

    OpenELM 1.1B Response

    Math is definitely not its strong suite.

    A bat and ball cost 1.10 and the difference between a bat and a ball is is 1. How much does each cost? Be brief in your response.
    
    A bat: B - 1, B_Ball - 1.10, B_Diff - 1
    
    A ball: B_Ball - 1.10, B_Diff - 1, B_Cost - 1
    
    Solution:
    
    A bat cost B = B_Ball + B_Diff = B_Cost + 1
    
    A ball cost B_Ball = B_Cost + B_Diff = B_Cost + 2
    
    Solution:
    
    A bat cost B = 2 = 2 + 1
    
    A ball cost B_Ball = B_Cost + B_Diff = 2 + 3
    
    Solution:
    
    A bat cost B = 2 + 3 = 4
    
    A ball cost B_Ball = B_Cost + B_Diff = 4 + 3
    
    Solution:
    
    A bat cost B = 4 = 4 + 2 + 3
    
    A bat + ball cost = 4 + 4 = 8
    
    8 = 8 * 2 + 4 * 3 + 2 * 4 + 3 * 4 + 4 *

    Logical Puzzle 2

    It takes 5 machines 5 minutes to make 5 widgets. How long does it take 100 machines to make 100 widgets? Be brief in your response.

    To us the answer is fairly obvious that it will be 5.

    Phi-3-mini Response

    It gets this right again, very strong in easy logical tasks.

    ### Response
    
    It still takes 5 minutes for 100 machines to make 100 widgets. The number of machines and widgets scales linearly, so the time remains the same.
    

    OpenELM 1.1B Response

    Well it tries to reason with itself, and arrives at some solution using a convoluted logic, but gets the wrong answer.

    It takes 5 machines 5 minutes to make 5 widgets. How long does it take 100 machines to make 100 widgets? Be brief in your response.

    It takes 5 machines 5 minutes to make 5 widgets. How long does it take 100 machines to make 100 widgets:

    1. 100 machines = 100 machines (i, j) = (1 + i + j - 1) machines
    2. 5 minutes = 60 seconds = 1 i-step (i = 1, 2, ..., n-1)
    3. 5 minutes / 60 seconds = 8.33333 i-steps per minute (i = 1 to n-1)
    4. 8.33333 i-steps per minute * 100 machines = 83.3333 seconds (i = 1 to 100)
    5. Multiply by 100:
    - 83.3333 seconds / 100 machines = 8.3333 minutes (m = 1 to 100)

    The answer is:

    100 machines = 8 minutes (m = 1 to 100)

    So, it takes 8 minutes to make 100 widgets using 100 machines.

    Conclusion

    It’s wonderful to witness Apple’s support for the open-source community by releasing LLMs that are slightly flawed but still highly functional, and have parameter sizes that can be accommodated on a single GPU. Additionally, I will be evaluating the larger 3B OpenELM in the near future, so stay tuned for the article.

  • Is Phi-3-mini really better than Llama 3? – Testing the Limits of Small LLMs in Real-World Scenarios

    Is Phi-3-mini really better than Llama 3? – Testing the Limits of Small LLMs in Real-World Scenarios

    Microsoft released Phi-3-mini, a 3.8 billion parameter model, today without much fanfare. What is really surprising is that according to the paper, despite being a smaller model, it outperforms the recently released Llama 3 8B model. Note that it has less than 50% of the parameters and can potentially run on your hand held device with quantisation and other neat tricks and still outperform Llama 3 8B. Seems like Llama 3’s run of being the best small model didn’t really last long.

    So I decided to test this out myself, where I asked both Llama 3 and Phi-3-mini 3 different questions to make a qualitative evaluation on whether it’s really better than Llama 3 or is it the case that Phi-3 has been overfitted to perform well on the leaderboards.

    I’m using Phi-3-mini instruct with 4k context length and Llama 3 8B Instruct.

    Here is the link to the collab notebook if you want to play around with it.

    The Needle in the Haystack

    I presented a prompt consisting of approximately 2000 tokens, in which I inserted the sentence "Paneer Kadhai is the dish of the day" at a random location as a reference point. At the end of the prompt, I posed the question “What is the dish of the day?” as a means of assessing the model’s ability to recall information within a large context. As anticipated, Llama 3 8B gets the correct answer, however Phi-3-mini fails to return the answer. It returns some text from the prompt, but fails to return the answer. So this round goes to Llama 3.

    Pattern Recognition

    Now, this is a challenging problem. The models have to complete this sequence – 1,8,9,64,25,216,_.

    As you can observe, the sequence is that odd place numbers follow the function x2, and even place numbers follow the function x3 . Knowing this, the answer is fairly obvious that the missing number is 7*7 = 49.

    Here Llama 3 was not able to find the answer, it inferred correctly that the response has something to do with squaring the numbers, but was unable to arrive at the correct response.

    Fill in the blank letter '_' in  the pattern 1,8,9,64,25,216,_.
    The pattern is obtained by squaring the numbers. So, the next number in the pattern is 1,8,9,64,25,216, 441.
    Final Answer: The final answer is 441. I hope it is correct.

    Phi-3-mini was equal in performance to Llama 3 at this task, which I thought was very impressive given its small size. In its response, it tried to reason that the sequence might contain cubing the numbers, though it didn’t arrive at the correct response. Phi-3-mini response –

    #### Response

    The pattern appears to be based on the sequence of cubes of natural numbers, but with alternating signs and skipping one number each time. The sequence is:

    1^3 = 1
    (-2)^3 = -8
    3^3 = 27 (but given as 9, so it seems there's a pattern of subtracting 18)
    (-4)^3 = -64
    5^3 = 125 (but given as 64, so it seems there's a pattern of subtracting 61)
    7^3 = 343 (but given as 25, so it seems there's a pattern of subtracting 318)

    Following this pattern, the next number should be the cube of 8, but with the same subtraction pattern. Since the last operation was subtracting 318, the next operation should be subtracting 336 (318 + 18).

    8^3 = 512
    512 - 336 = 176

    So the missing number is 17

    This round is a tie.


    SQL Coding

    Finally, I asked both models to provide me with a coding solution to this leetcode problem. Now there are many solutions to this problem, but I was interested in seeing how both the LLMs approach this problem.

    Both Llama 3 and Phi 3 didn’t answer the problem correctly, but Llama 3 gave me SQL to test, which is better than returning no code.

    Phi-3 took an interesting approach. Rather than providing an SQL solution, it generated more examples—erroneous examples and due to the max tokens limitations, the answer was truncated.

    Example 2:
    
    Input: 
    Employee table:
    +-----+-------+------------+-----------+
    | id  | name  | department | managerId |
    +-----+-------+------------+-----------+
    | 101 | John  | A          | null      |
    | 102 | Dan   | A          | 101       |
    | 103 | James | A          | 101       |
    | 104 | Amy   | A          | 101       |
    | 105 | Anne  | A          | 101       |
    | 106 | Ron   | B          | 101       |
    | 107 | Tom   | B          | 106       |
    | 108 | Bob   | B          | 106       |
    | 109 | Tim   | B          | 106       |
    | 110 | Sam   | B          | 106       |
    +-----+-------+------------+-----------+
    Output: 
    +------+
    | name |
    +------+
    | John |
    | Ron  |
    +------+
    
    
    Example 3:
    
    Input: 
    Employee table:
    +-----+-------+------------+-----------+
    | id  | name  | department | managerId |
    +-----+-------+------------+-----------+
    | 101 | John  | A          | null      |
    | 102 | Dan   | A          | 101       |
    | 103 | James | A          | 101       |
    | 104 | Amy   | A          | 101       |
    | 105 | Anne  | A          | 101       |
    | 106 | Ron   | B          | 101       |
    | 107 | Tom   | B          | 106       |
    | 108 | Bob   | B          | 106       |
    | 109 | Tim   | B          | 106       |
    | 110 | Sam   | B          | 106       |
    | 111 | Mike  | B          | 107       |
    | 112 | Jack  | B          | 107       |
    | 113 | Jill  | B          | 107       |
    +-----+-------+------------+-----------+
    Output: 
    +------+
    | name |
    +------+
    | John |
    | Ron  |
    | Tom  |
    +------+
    
    
    Example 4:
    
    Input: 
    Employee table:
    +-----+-------+------------+-----------+
    | id  | name  | department | managerId |
    +-----+-------+------------+-----------+
    | 101 | John  | A          | null      |
    | 102 | Dan   | A          | 101       |
    | 103 | James | A          | 101       |
    | 104 | Amy   | A          | 101       |
    | 105 | Anne  | A          | 101       |
    | 106 | Ron   | B          | 101       |
    | 107 | Tom   | B          | 106       |
    | 108 | Bob   | B          | 106       |
    | 109 | Tim   | B          | 106       |
    | 110 | Sam   | B          | 106       |
    | 111 | Mike  | B          | 107       |
    | 112 | Jack  | B          | 107       |
    | 113 | Jill  | B          | 107       |
    | 114 | Mark  | B          | 108       |
    | 115 | John  | B          | 108       |
    +-----+-------+------------+-----------+
    Output: 
    +------+
    | name |
    +------+
    | John |
    | Ron  |
    | Tom  |
    | Bob  |
    +------+
    
    
    Example 5:
    
    Input: 
    Employee table:
    +-----+-------+------------+-----------+
    | id  | name  | department | managerId |
    +-----+

    Conclusion

    QuestionLlama 3 8B InstructPhi-3-mini 4K
    Needle in HaystackX
    Pattern RecognitionXX
    SQL CodeX

    While 3 questions are not enough to determine the strength of these small but still massive models, it still gave us some indication of the model’s strength, and to be honest, I think Phi-3-mini is overfitted to the benchmark. It may have faster inferencing and can fit on your device, but if you want better results, I’ll stick to using either Mistral or Llama 3B.

  • Is Llama 3 Really Better Than Mistral?

    Is Llama 3 Really Better Than Mistral?

    With the recent launch of the much anticipated Llama 3, I decided to use both Mistral, which is one of the best small (7B) language models out there, and Llama 3, which according to its benchmark scores claims to outperform Mistral. But is it really better when it comes to using it as the LLM in your RAG applications? To test this, I put the same questions to both Mistral and Llama 3 and the results will surprise you.

    Link to Collab

    I created a RAG application using Ollama. In case you want to know how you can do it yourself, you can check out this post. I used the Elden Ring Wikipedia article as the document for contextual retrieval. I was using conversation buffer memory, which just passes the entire conversational history as context back to the language model. Furthermore, I asked the same question to both LLMs, and at the end we also asked the same question to the current king of LLMs, GPT-4. The question was –

    "How many awards did Elden Ring Win, and did it win Game of the year award ?"

    The entire prompt with the context was –

    Be precise in your response. Given the context - Elden Ring winning Game of the Year
    at the 23rd Game Developers Choice
    AwardsSome reviewers criticized a number of the game's menu and accessibility systems.[84][85] Reviewers
    complained about the poor performance of the Window s version; framerate issues were commonly
    mentioned.[81][86] Reviewers noted the story of Elden Ring lacks Martin's writing style. Kyle Orland of Ars
    Technica said the game's storytelling is "characteristically sparse and cryptic", and differs from the
    expectations of Martin's fans.[76] Chris Carter of Destructoid called the story "low key" but said it is better-
    told than those of previous FromSoftware games.[80] Aoife Wilson of Eurogam er said George R. R.
    Martin's heavy inclusion in the marketing was "baffling" when his contributions to the overall narrative
    were unclear.[72] Mitchell Saltzman did not mind the lack of Martin's style, saying the side-stories rather
    than any gr and, ove rarching pl ot kept him "enthralled".[70]
    
    120. Mejia, Ozzie (January 26, 2023). "Elden Ring & Stray lead Game Developers Choice
    Awards 2023 nominees" (https://www.shacknews.com/article/133863/gdc-2023-award-nomi
    nees). Shacknews. Archived (https://web.archive.org/web/20230127040625/https://www.sha
    cknews.com/article/133863/gdc-2023-award-nominees) from the original on January 27,
    2023. Retrieved January 27, 2023.
    121. Beth Elderkin (March 22, 2023). "'Elden Ring' Wins Game Of The Year At The 2023 Game
    Developers Choice Awards" (https://gdconf.com/news/elden-ring-wins-game-year-2023-gam
    e-developers-choice-awards). Game Developers Choice Conference. Archived (https://web.
    archive.org/web/20230323091858/https://gdconf.com/news/elden-ring-wins-game-year-2023
    -game-developers-choice-awards) from the original on March 23, 2023. Retrieved March 23,
    2023.
    122. "gamescom award 2021: These were the best games of the year" (https://www.gamescom.gl
    obal/en/gamescom/gamescom-award/gamescom-award-review-2021). Gamescom.
    
    Tin Pan Alley Award for
    Best Music in a GameNominated
    The Steam
    AwardsJanuary 3, 2023Game of the Year Won
    [132]
    Best Game You Suck At Won
    The Streamer
    AwardsMarch 11, 2023Stream Game of the
    YearWon[133]
    1. Sawyer, Will; Franey, Joel (April 8, 2022). "Where Elden Ring takes place and the story
    explained" (https://www.gamesradar.com/elden-ring-where-does-it-take-place-setting-story-l
    ore/). gamesradar. Archived (https://web.archive.org/web/20220402212714/https://www.gam
    esradar.com/elden-ring-where-does-it-take-place-setting-story-lore/) from the original on April
    2, 2022. Retrieved July 26, 2022.
    2. Knapp, Mark (June 16, 2021). "Elden Ring: Release Date, Gameplay, and What We Know
    So Far" (https://www.ign.com/articles/elden-ring-release-date-news-gameplay-trailer-story).
    IGN. Ziff Davis. Archived (https://web.archive.org/web/20220303124310/https://www.ign.co
    m/articles/elden-ring-release-date-news-gameplay-trailer-story) from the original on March 3,
    2022. Retrieved July 6, 2021.
    
    Elden Ring sold 13.4 million copies worldwide by the end of March 2022,[87] and 20 million by February
    2023.[88] It was the best-selling game in several regions between February and March 2022,[89][90] and is
    the fastest-selling Bandai Namco game of all time.[91][92] It was the second-bestselling game of 2022 in the
    US after Call of Duty: Modern Warfare II,[93] the third-bestselling in Europe,[94] and the tenth-bestselling in
    Japan at retail.[95]
    Elden Ring won many awards and accolades. It was listed as the
    2022' s Game of the Year by several publications, including Ars
    Technica,[96] Destructoid,[97] EGM,[98] Eurogam er,[99] Game
    Informer,[100] GamesRadar +,[101] GameSpot,[102] IGN,[103] PC
    Gamer,[104] and Polygon.[105] It is among the most awarded games
    in history with 324 G ame of the Year awards.[106][107]S a l e s
    A w a r d s,  Answer the following question - How many awards did Elden Ring Win, and did it win Game of the year award ?.
    

    Mistral

    1. "Elden Ring" (https://www.metacritic.com/game/pc/elden-ring). Metacritic. Archived (https://web.
    archive.org/web/20220402212714/https://www.metacritic.com/game/pc/elden-ring) from the
    original on April 2, 2022. Retrieved April 2, 2022.
    2. "Elden Ring" (https://www.met

    Llama 3 – 8B

    Elden Ring winning Game of the Year
    at the 23rd Game Developers Choice
    AwardsSome reviewers criticized a number of the game's menu and accessibility systems.[84][85] Reviewers
    complained about the poor performance of the Window s version; framerate issues were commonly
    mentioned.[81][86] Reviewers noted the story of Elden Ring lacks Martin's writing style. Kyle Orland of Ars
    Technica said the game's storytelling is "characteristically sparse and cryptic", and differs from the
    expectations of Martin's fans.[76] Chris Carter

    We can clearly see that Llama 3 – 8B got one part right, however, from the context it couldn’t find that it won 324 awards in total.

    GPT-4

    'Elden Ring won 324 Game of the Year awards. Yes, it did win the Game of the Year award at the 23rd Game Developers Choice Awards.'

    GPT-4 is still far ahead of smaller LLMs, but Llama 3 8B has improved compared to Mistral.

  • Build Fully Local RAG Application with LLaMA 3: A Step-by-Step Guide

    Build Fully Local RAG Application with LLaMA 3: A Step-by-Step Guide

    Meta just launched Llama 3 and its the best open source LLM you can use. So why not build a RAG Application using it. You can use the model for text-generation using either HuggingFace or Ollama, we will be using Ollama to create a RAG application which will run locally.

    In this tutorial, we will build a Retrieval Augmented Generation(RAG) Application using Ollama and Langchain. For the vector store, we will be using Chroma, but you are free to use any vector store of your choice.

    In case you just want the collab notebook, it’s available here.

    There are 4 key steps to building your RAG application –

    1. Load your documents
    2. Add them to the vector store using the embedding function of your choice.
    3. Define your prompt template.
    4. Deinfe your Retrieval Chatbot using the LLM of your choice.

    First we load the required libraries.

    # Loading required libraries
    import os
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    from langchain_community.document_loaders import PyPDFLoader
    from langchain_community.vectorstores import Chroma
    from langchain.chains import RetrievalQA
    from langchain.memory import ConversationSummaryMemory
    from langchain_openai import OpenAIEmbeddings
    from langchain.prompts import PromptTemplate
    from langchain.llms import Ollama

    Then comes step 1 which is to load our documents. Here I’ll be using Elden Ring Wiki PDF, you can just visit the Wikipedia page and download it as a PDF file.

    data_path = "./data/Elden_Ring.pdf"
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=2000,
        chunk_overlap=30,
        length_function=len,)
    documents = PyPDFLoader(data_path).load_and_split(text_splitter=text_splitter)

    In case you want to learn in detail about ChromaDB, you can visit our detailed guide to using ChromaDB. The next step is to use an embedding function that will convert our text into embeddings. I prefer using OpenAI embeddings, but you can use any embedding function. Using this embedding function we will add our documents to the Chroma vector database.

    embedding_func = OpenAIEmbeddings(api_key=os.environ.get("OPENAI_API_KEY"))
    vectordb = Chroma.from_documents(documents, embedding=embedding_func)

    Moving on, we have to define a prompt template. I’ll be using the mistral model, so its a very basic prompt template that mistral provides.

    template = """<s>[INST] Given the context - {context} </s>[INST] [INST] Answer the following question - {question}[/INST]"""
    pt = PromptTemplate(
                template=template, input_variables=["context", "question"]
            )

    All that is left to do is to define our memory and Retrieval Chatbot using Ollama as the LLM. To use Llama 3 as the LLM, all you have to do is define “llama3” as the model name.

    rag = RetrievalQA.from_chain_type(
                llm=Ollama(model="mistral"),
                retriever=vectordb.as_retriever(),
                memory=ConversationSummaryMemory(llm = Ollama(model="mistral")),
                chain_type_kwargs={"prompt": pt, "verbose": True},
            )
    rag.invoke("What is Elden Ring ?")
    >>> {'query': 'What is Elden Ring ?',
     'history': '',
     'result': ' Elden Ring is a 2022 action role-playing game developed by FromSoftware. It was published for PlayStation 4, PlayStation 5, Windows, Xbox One, and Xbox Series X/S. In the game, players control a customizable character on a quest to repair the Elden Ring and become the new Elden Lord. The game is set in an open world, presented through a third-person perspective, and includes several types of weapons and magic spells. Players can traverse the six main areas using their steed Torrent and discover linear hidden dungeons and checkpoints that enable fast travel and attribute improvements. Elden Ring features online multiplayer mode for cooperative play or player-versus-player combat. The game was developed with inspirations from Dark Souls series, and contributions from George R.R. Martin on the narrative and Tsukasa Saitoh, Shoi Miyazawa, Tai Tomisawa, Yuka Kitamura, and Yoshimi Kudo for the original soundtrack. Elden Ring received critical acclaim for its open world, gameplay systems, and setting, with some criticism for technical performance. It sold over 20 million copies and a downloadable content expansion, Shadow of the Erdtree, is planned to be released in June 2024.'}

    In sum, building a Retrieval Augmented Generation (RAG) application using the newly released LLaMA 3 model, Ollama, and Langchain enables robust local solutions for natural language queries. This tutorial walked you through the comprehensive steps of loading documents, embedding them into a vector store like Chroma, and setting up a dynamic RAG application that retrieves and generates responses efficiently. By harnessing the power of the newly released LLaMA 3 by Meta as the LLM and Langchain to create the chatbot, you can create intelligent systems that significantly enhance user interaction and information retrieval. The capabilities demonstrated here illustrate just a fraction of the potential applications. Let me know in the comments if you want me to cover something else.

  • Ultimate Guide to Chroma Vector Database: Everything You Need to Know – Part 1

    Ultimate Guide to Chroma Vector Database: Everything You Need to Know – Part 1

    In this tutorial, we will walk through how to use Chromadb as your vector database for all your Retrieval-Augmented Generation (RAG) tasks.

    But before that, you need to install Chromadb, if you’re using Python then all you need to do is –

    pip install chromadb

    Now that you’ve installed Chromadb, let’s begin. We will use a PDF file as an example. For the PDF we will be using this research paper, but feel free to use the PDF of your choice.

    The first step is to create a persistent client, i.e., the storage which can be used at multiple places. While creating the client remember to add the setting to allow resetting the client should you require this functionality.

    import chromadb
    
    client = chromadb.PersistentClient(
                path="<path of persistent storage>", settings=chromadb.config.Settings(allow_reset=True)
            )

    Once you’ve a client set up then you can define collections within it. If you use BigQuery or any SQL products, imagine the client being the project and the collection being the dataset. Within the collection, you can store the documents as embeddings. In this example, we will call our collection as “research”. Also, one very important thing to remember is that each collection should have its own embedding function that has to be fixed. The query is also passed as an embedding when you try to search for the most similar documents. So in case you use embedding function X to add the documents and use embedding function Y to query them, then the similarity scores will not be correct, so this is a point to remember. We will be using the OpenAI ttext-embedding-3-small model. Another point to remember is that in a single document, you should only have as many tokens as the embedding function can embed. If say your embedding function is all-MiniLM-L6-v2 from HuggingFace, then the max sequence length that the function can handle is 256, so if you try to vectorise a file with longer context, then it will just clip the document to 256 tokens and embed that. The model from OpenAI has a longer max sequence length, but how much exactly is hard to find.

    # Defining the embedding function
    embedding_func = embedding_functions.OpenAIEmbeddingFunction(api_key=os.environ.get("OPENAI_API_KEY") , 
                                                                 model_name="text-embedding-3-small")
    

    Creating the collection, it’s best practice to specify the embedding function while creating the collection, otherwise, Chromadb uses a default embedding function. Chromadb will use sentence-transformer as a default if no embedding function is supplied.

    Now we will need to add a document to this collection, for this, we will use some helper functions from langchain.

    from langchain.text_splitter import RecursiveCharacterTextSplitter
    from langchain_community.document_loaders import PyPDFLoader
    
    data_path = "./data/2311.04635v1.pdf"
    
    pdf_loader = PyPDFLoader(data_path)
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=50,
        separators=["\n\n", "\n", ". ", " ", ""],
    )
    documents = pdf_loader.load_and_split(text_splitter=text_splitter)

    The PyPDFLoader will help to load the PDF file and the RecursiveCharacterTextSplitter will help in splitting the PDF into chunks. We are using a chunk size of 1000 with an overlap of 50, meaning the chunk sizes will be roughly 1000 tokens with overlapping text of 50 tokens. You can learn more about how the text splitter works here.

    Now that we have our documents loaded, time to add them to the Chromadb collection. Since we’ve specified the embedding function in the collection already, we can simply add the text files as embeddings. You have to specify “ids”, and think of them as table names in SQL, also you can specify metadata for each document, both are useful when you want to upsert or delete documents in the collection.

    collection.add(
                documents=[i.page_content for i in documents],
                ids=[f"pdf_chunk_{i}" for i in range(len(documents))],
                metadatas=
                [
                    {
                        "file_name": "reasearch_paper",
                        "timestamp": datetime.now(timezone.utc).isoformat(),
                    }
                    for _ in documents
                ],
            )

    Here we use the page contents of the loaded documents as documents, since they are text, the collection using its embedding function will automatically convert them as embeddings. In case you already have embeddings, you can directly add them as embeddings. I’ve also specified some ids, these are very rudimentary here for illustration purposes. Also, I’ve specified some metadata for each document. Both of these will be used to query, upsert or delete individual documents from the vector database.

    Read more about how you can upsert documents, query a collection and delete individual documents in Part II of this Ultimate Guide to ChromaDB

  • Embed Documents Using Ollama – OllamaEmbeddings

    You can now create document embeddings using Ollama. Also once these embeddings are created, you can store them on a vector database. You can read this article where I go over how you can do so.

    from langchain_community.embeddings import OllamaEmbeddings
    ollama_emb = OllamaEmbeddings(
    model="mistral",
    )
    r1 = ollama_emb.embed_documents(
    [
    "Alpha is the first letter of Greek alphabet",
    "Beta is the second letter of Greek alphabet",
    "This is a random sentence"
    ]
    )
    r2 = ollama_emb.embed_query(
    "What is the second letter of Greek alphabet"
    )

    Let’s inspect the array shapes-

    print(np.array(r1).shape)
    >>> (3,4096)
    print(np.array(r2).shape)
    >>> (4096,)

    Now we can also find the cosine similarity between the vectors –

    from sklearn.metrics.pairwise import cosine_similarity
    cosine_similarity(np.array(r1), np.array(r2).reshape(1,-1))
    >>>array([[0.62087283],
    [0.65085897],
    [0.36985642]])

    Here we can clearly see that the second document in our 3 reference documents is the closest to our question. Similarly, you can also create embeddings from your text documents and store them and can later query them using Ollama and LangChain.

  • How does ChatGPT remember? LLM Memory Explained.

    In the fascinating world of conversational AI, the ability of systems like ChatGPT to remember and refer back to earlier parts of a conversation is nothing short of magic. But how does this seemingly simple act of recollection work under the hood? Let’s dive into the concept of memory in large language models (LLMs) and uncover the mechanisms that enable these digital conversationalists to keep track of our chats.

    The Essence of Memory in Conversational AI

    Memory in conversational AI systems is about the ability to store and recall information from earlier interactions. This capability is crucial for maintaining the context and coherence of a conversation, allowing the LLM to reference past exchanges and build upon them meaningfully. This also gives the appearance that the LLM has intelligence when in reality they are stateless and have no inbuilt memory.

    LangChain, a framework for building conversational AI applications, highlights the importance of memory in these systems. It distinguishes between two fundamental actions that a memory system needs to support: reading and writing.

    What happens is that the LLM is passed an additional context of memory in addition to your input as a prompt so that it can process the information as if it had all the context from the get-go.

    Building Memory into Conversational Systems

    The development of an effective memory system involves two key design decisions: how the state is stored and how it is queried.

    Storing: The Backbone of Memory

    Underneath any memory system lies a history of all chat interactions. These can range from simple in-memory lists to sophisticated persistent databases. Storage is simple, you can store all past conversations in a database. You can either store them as simple text documents or use a vector database and store them as embeddings.

    Querying: The Brain of Memory

    Storing chat messages is only one part of the equation. The real magic happens in the querying phase, where data structures and algorithms work together to present a view of the message history that is most useful for the current context. This might involve returning the most recent messages, summarizing past interactions, or extracting and focusing on specific entities mentioned in the conversation.

    Practical Implementation with LangChain

    Here we will take a look at one way to store memory using LangChain.

    from langchain.memory import ConversationBufferMemory

    memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

    Now you can attach this memory to any LLM chain and it will add the entire previous conversations as context to the LLM after each chain invoke. The advantage of using this kind of memory is that its simple to implement. The disadvantage is that in longer conversations you’re passing more tokens and the input prompt size explodes, meaning slower response and if you’re using paid models like GPT-4, then costs also increase.

    Conclusion

    The ability of systems like ChatGPT to remember past interactions is a cornerstone of effective chatbots. By leveraging sophisticated memory systems, developers can create applications that not only understand the current context but can also draw on previous exchanges to provide more coherent and engaging responses. As we continue to push the boundaries of what conversational AI can achieve, the exploration and enhancement of memory mechanisms will remain a critical area of focus.

  • Build RAG Application Using Ollama

    In this tutorial, we will build a Retrieval Augmented Generation(RAG) Application using Ollama and Langchain. For the vector store, we will be using Chroma, but you are free to use any vector store of your choice.

    There are 4 key steps to building your RAG application –

    1. Load your documents
    2. Add them to the vector store using the embedding function of your choice.
    3. Define your prompt template.
    4. Deinfe your Retrieval Chatbot using the LLM of your choice.

    In case you want the collab notebook, you can click here.

    First we load the required libraries.

    # Loading required libraries
    import os

    from langchain.text_splitter import RecursiveCharacterTextSplitter
    from langchain_community.document_loaders import PyPDFLoader
    from langchain_community.vectorstores import Chroma
    from langchain.chains import RetrievalQA
    from langchain.memory import ConversationSummaryMemory
    from langchain_openai import OpenAIEmbeddings
    from langchain.prompts import PromptTemplate
    from langchain.llms import Ollama

    Then comes step 1 which is to load our documents. Here I’ll be using Elden Ring Wiki PDF, you can just visit the Wikipedia page and download it as a PDF file.

    data_path = "./data/Elden_Ring.pdf"
    text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2000,
    chunk_overlap=30,
    length_function=len,)

    documents = PyPDFLoader(data_path).load_and_split(text_splitter=text_splitter)

    The next step is to use an embedding function that will convert our text into embeddings. I prefer using OpenAI embeddings, but you can use any embedding function. Using this embedding function we will add our documents to the Chroma vector database.

    embedding_func = OpenAIEmbeddings(api_key=os.environ.get("OPENAI_API_KEY"))
    vectordb = Chroma.from_documents(documents, embedding=embedding_func)

    Moving on, we have to define a prompt template. I’ll be using the mistral model, so its a very basic prompt template that mistral provides.

    template = """<s>[INST] Given the context - {context} </s>[INST] [INST] Answer the following question - {question}[/INST]"""
    pt = PromptTemplate(
    template=template, input_variables=["context", "question"]
    )

    All that is left to do is to define our memory and Retrieval Chatbot using Ollama as the LLM.

    rag = RetrievalQA.from_chain_type(
    llm=Ollama(model="mistral"),
    retriever=vectordb.as_retriever(),
    memory=ConversationSummaryMemory(llm = Ollama(model="mistral")),
    chain_type_kwargs={"prompt": pt, "verbose": True},
    )
    rag.invoke("What is Elden Ring ?")
    >>> {'query': 'What is Elden Ring ?',
    'history': '',
    'result': ' Elden Ring is a 2022 action role-playing game developed by FromSoftware. It was published for PlayStation 4, PlayStation 5, Windows, Xbox One, and Xbox Series X/S. In the game, players control a customizable character on a quest to repair the Elden Ring and become the new Elden Lord. The game is set in an open world, presented through a third-person perspective, and includes several types of weapons and magic spells. Players can traverse the six main areas using their steed Torrent and discover linear hidden dungeons and checkpoints that enable fast travel and attribute improvements. Elden Ring features online multiplayer mode for cooperative play or player-versus-player combat. The game was developed with inspirations from Dark Souls series, and contributions from George R.R. Martin on the narrative and Tsukasa Saitoh, Shoi Miyazawa, Tai Tomisawa, Yuka Kitamura, and Yoshimi Kudo for the original soundtrack. Elden Ring received critical acclaim for its open world, gameplay systems, and setting, with some criticism for technical performance. It sold over 20 million copies and a downloadable content expansion, Shadow of the Erdtree, is planned to be released in June 2024.'}

    We see that it was even able to tell us when Shadow of the Erdtree is planned to release for which I’m really excited about. Let me know in the comments if you want to cover anything else.

  • Create Your Own Vector Database

    In this tutorial, we will walk through how you can create your own vector database using Chroma and Langchain. With this, you will be able to easily store PDF files and use the chroma db as a retriever in your Retrieval Augmented Generation (RAG) systems. In another part, I’ll walk over how you can take this vector database and build a RAG system.

    # Importing Libraries

    import chromadb
    import os
    from chromadb.utils import embedding_functions
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    from langchain_community.document_loaders import PyPDFLoader
    from typing import Optional
    from pathlib import Path
    from glob import glob
    from uuid import uuid4

    Now we will define some variables –

    db_path = <path you want to store db >
    collection_name = <name of collection of chroma, it's similar to dataset>
    document_dir_path = <path where the pdfs are stored>

    Now, you also need to create an embedding function, I will use the OpenAI model in the embedding function as it’s very cheap and good but you can use open-source embedding functions as well. You’ll need to pass this embedding function every time you call the collection.

    embedding_func = embedding_functions.OpenAIEmbeddingFunction(
    api_key=<openai_api_key> ,
    model_name="text-embedding-3-small",
    )

    Now we need to initialise the client, we will be using a persistent client and create our collection.

    client = chromadb.PersistentClient(path=db_path)
    client.create_collection(
    name=collection_name,
    embedding_function=embedding_func,
    )

    Now let’s load our PDFs. To do this, first, we will create a text splitter and then for each PDF, load it and split it into documents, which will then be stored in the collection. You can use any chunk size you want, we will use 1000 here.

    chunk_size = 1000

    #Load the collection
    collection = client.get_collection(
    collection_name, embedding_function=embedding_func
    )
    text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size=chunk_size,
    chunk_overlap=20,
    length_function=len,
    )

    for pdf_file in glob(f"{document_dir_path}*.pdf"):
    pdf_loader = PyPDFLoader(pdf_file)
    documents = [
    doc.page_content
    for doc in pdf_loader.load_and_split(text_splitter=text_splitter)
    ]
    collection.add(
    documents=documents,
    ids=[str(uuid4()) for _ in range(len(documents))],
    )

    The collections require an id to be passed, you can pass any string value, here we are passing random strings, but you can, for example, pass the name of the file as id.

    Let me know in case you’ve any questions.