Tag: Llama3

Is Phi-3-mini really better than Llama 3? – Testing the Limits of Small LLMs in Real-World Scenarios

Microsoft released Phi-3-mini, a 3.8 billion parameter model, today without much fanfare. What is really surprising is that according to the paper, despite being a smaller model, it outperforms the recently released Llama 3 8B model. Note that it has less than 50% of the parameters and can potentially run on your hand held device with quantisation and other neat tricks and still outperform Llama 3 8B. Seems like Llama 3’s run of being the best small model didn’t really last long.

So I decided to test this out myself, where I asked both Llama 3 and Phi-3-mini 3 different questions to make a qualitative evaluation on whether it’s really better than Llama 3 or is it the case that Phi-3 has been overfitted to perform well on the leaderboards.

I’m using Phi-3-mini instruct with 4k context length and Llama 3 8B Instruct.

Here is the link to the collab notebook if you want to play around with it.

The Needle in the Haystack

I presented a prompt consisting of approximately 2000 tokens, in which I inserted the sentence "Paneer Kadhai is the dish of the day" at a random location as a reference point. At the end of the prompt, I posed the question “What is the dish of the day?” as a means of assessing the model’s ability to recall information within a large context. As anticipated, Llama 3 8B gets the correct answer, however Phi-3-mini fails to return the answer. It returns some text from the prompt, but fails to return the answer. So this round goes to Llama 3.

Pattern Recognition

Now, this is a challenging problem. The models have to complete this sequence – 1,8,9,64,25,216,_.

As you can observe, the sequence is that odd place numbers follow the function x², and even place numbers follow the function x³ . Knowing this, the answer is fairly obvious that the missing number is 7*7 = 49.

Here Llama 3 was not able to find the answer, it inferred correctly that the response has something to do with squaring the numbers, but was unable to arrive at the correct response.

Fill in the blank letter '_' in  the pattern 1,8,9,64,25,216,_.
The pattern is obtained by squaring the numbers. So, the next number in the pattern is 1,8,9,64,25,216, 441.
Final Answer: The final answer is 441. I hope it is correct.

Phi-3-mini was equal in performance to Llama 3 at this task, which I thought was very impressive given its small size. In its response, it tried to reason that the sequence might contain cubing the numbers, though it didn’t arrive at the correct response. Phi-3-mini response –

#### Response

The pattern appears to be based on the sequence of cubes of natural numbers, but with alternating signs and skipping one number each time. The sequence is:

1^3 = 1
(-2)^3 = -8
3^3 = 27 (but given as 9, so it seems there's a pattern of subtracting 18)
(-4)^3 = -64
5^3 = 125 (but given as 64, so it seems there's a pattern of subtracting 61)
7^3 = 343 (but given as 25, so it seems there's a pattern of subtracting 318)

Following this pattern, the next number should be the cube of 8, but with the same subtraction pattern. Since the last operation was subtracting 318, the next operation should be subtracting 336 (318 + 18).

8^3 = 512
512 - 336 = 176

So the missing number is 17

This round is a tie.

SQL Coding

Finally, I asked both models to provide me with a coding solution to this leetcode problem. Now there are many solutions to this problem, but I was interested in seeing how both the LLMs approach this problem.

Both Llama 3 and Phi 3 didn’t answer the problem correctly, but Llama 3 gave me SQL to test, which is better than returning no code.

Phi-3 took an interesting approach. Rather than providing an SQL solution, it generated more examples—erroneous examples and due to the max tokens limitations, the answer was truncated.

Example 2:

Input: 
Employee table:
+-----+-------+------------+-----------+
| id  | name  | department | managerId |
+-----+-------+------------+-----------+
| 101 | John  | A          | null      |
| 102 | Dan   | A          | 101       |
| 103 | James | A          | 101       |
| 104 | Amy   | A          | 101       |
| 105 | Anne  | A          | 101       |
| 106 | Ron   | B          | 101       |
| 107 | Tom   | B          | 106       |
| 108 | Bob   | B          | 106       |
| 109 | Tim   | B          | 106       |
| 110 | Sam   | B          | 106       |
+-----+-------+------------+-----------+
Output: 
+------+
| name |
+------+
| John |
| Ron  |
+------+


Example 3:

Input: 
Employee table:
+-----+-------+------------+-----------+
| id  | name  | department | managerId |
+-----+-------+------------+-----------+
| 101 | John  | A          | null      |
| 102 | Dan   | A          | 101       |
| 103 | James | A          | 101       |
| 104 | Amy   | A          | 101       |
| 105 | Anne  | A          | 101       |
| 106 | Ron   | B          | 101       |
| 107 | Tom   | B          | 106       |
| 108 | Bob   | B          | 106       |
| 109 | Tim   | B          | 106       |
| 110 | Sam   | B          | 106       |
| 111 | Mike  | B          | 107       |
| 112 | Jack  | B          | 107       |
| 113 | Jill  | B          | 107       |
+-----+-------+------------+-----------+
Output: 
+------+
| name |
+------+
| John |
| Ron  |
| Tom  |
+------+


Example 4:

Input: 
Employee table:
+-----+-------+------------+-----------+
| id  | name  | department | managerId |
+-----+-------+------------+-----------+
| 101 | John  | A          | null      |
| 102 | Dan   | A          | 101       |
| 103 | James | A          | 101       |
| 104 | Amy   | A          | 101       |
| 105 | Anne  | A          | 101       |
| 106 | Ron   | B          | 101       |
| 107 | Tom   | B          | 106       |
| 108 | Bob   | B          | 106       |
| 109 | Tim   | B          | 106       |
| 110 | Sam   | B          | 106       |
| 111 | Mike  | B          | 107       |
| 112 | Jack  | B          | 107       |
| 113 | Jill  | B          | 107       |
| 114 | Mark  | B          | 108       |
| 115 | John  | B          | 108       |
+-----+-------+------------+-----------+
Output: 
+------+
| name |
+------+
| John |
| Ron  |
| Tom  |
| Bob  |
+------+


Example 5:

Input: 
Employee table:
+-----+-------+------------+-----------+
| id  | name  | department | managerId |
+-----+

Conclusion

Question	Llama 3 8B Instruct	Phi-3-mini 4K
Needle in Haystack	X
Pattern Recognition	X	X
SQL Code	X

While 3 questions are not enough to determine the strength of these small but still massive models, it still gave us some indication of the model’s strength, and to be honest, I think Phi-3-mini is overfitted to the benchmark. It may have faster inferencing and can fit on your device, but if you want better results, I’ll stick to using either Mistral or Llama 3B.

April 23, 2024

Is Llama 3 Really Better Than Mistral?

With the recent launch of the much anticipated Llama 3, I decided to use both Mistral, which is one of the best small (7B) language models out there, and Llama 3, which according to its benchmark scores claims to outperform Mistral. But is it really better when it comes to using it as the LLM in your RAG applications? To test this, I put the same questions to both Mistral and Llama 3 and the results will surprise you.

Link to Collab

I created a RAG application using Ollama. In case you want to know how you can do it yourself, you can check out this post. I used the Elden Ring Wikipedia article as the document for contextual retrieval. I was using conversation buffer memory, which just passes the entire conversational history as context back to the language model. Furthermore, I asked the same question to both LLMs, and at the end we also asked the same question to the current king of LLMs, GPT-4. The question was –

"How many awards did Elden Ring Win, and did it win Game of the year award ?"

The entire prompt with the context was –

Be precise in your response. Given the context - Elden Ring winning Game of the Year
at the 23rd Game Developers Choice
AwardsSome reviewers criticized a number of the game's menu and accessibility systems.[84][85] Reviewers
complained about the poor performance of the Window s version; framerate issues were commonly
mentioned.[81][86] Reviewers noted the story of Elden Ring lacks Martin's writing style. Kyle Orland of Ars
Technica said the game's storytelling is "characteristically sparse and cryptic", and differs from the
expectations of Martin's fans.[76] Chris Carter of Destructoid called the story "low key" but said it is better-
told than those of previous FromSoftware games.[80] Aoife Wilson of Eurogam er said George R. R.
Martin's heavy inclusion in the marketing was "baffling" when his contributions to the overall narrative
were unclear.[72] Mitchell Saltzman did not mind the lack of Martin's style, saying the side-stories rather
than any gr and, ove rarching pl ot kept him "enthralled".[70]

120. Mejia, Ozzie (January 26, 2023). "Elden Ring & Stray lead Game Developers Choice
Awards 2023 nominees" (https://www.shacknews.com/article/133863/gdc-2023-award-nomi
nees). Shacknews. Archived (https://web.archive.org/web/20230127040625/https://www.sha
cknews.com/article/133863/gdc-2023-award-nominees) from the original on January 27,
2023. Retrieved January 27, 2023.
121. Beth Elderkin (March 22, 2023). "'Elden Ring' Wins Game Of The Year At The 2023 Game
Developers Choice Awards" (https://gdconf.com/news/elden-ring-wins-game-year-2023-gam
e-developers-choice-awards). Game Developers Choice Conference. Archived (https://web.
archive.org/web/20230323091858/https://gdconf.com/news/elden-ring-wins-game-year-2023
-game-developers-choice-awards) from the original on March 23, 2023. Retrieved March 23,
2023.
122. "gamescom award 2021: These were the best games of the year" (https://www.gamescom.gl
obal/en/gamescom/gamescom-award/gamescom-award-review-2021). Gamescom.

Tin Pan Alley Award for
Best Music in a GameNominated
The Steam
AwardsJanuary 3, 2023Game of the Year Won
[132]
Best Game You Suck At Won
The Streamer
AwardsMarch 11, 2023Stream Game of the
YearWon[133]
1. Sawyer, Will; Franey, Joel (April 8, 2022). "Where Elden Ring takes place and the story
explained" (https://www.gamesradar.com/elden-ring-where-does-it-take-place-setting-story-l
ore/). gamesradar. Archived (https://web.archive.org/web/20220402212714/https://www.gam
esradar.com/elden-ring-where-does-it-take-place-setting-story-lore/) from the original on April
2, 2022. Retrieved July 26, 2022.
2. Knapp, Mark (June 16, 2021). "Elden Ring: Release Date, Gameplay, and What We Know
So Far" (https://www.ign.com/articles/elden-ring-release-date-news-gameplay-trailer-story).
IGN. Ziff Davis. Archived (https://web.archive.org/web/20220303124310/https://www.ign.co
m/articles/elden-ring-release-date-news-gameplay-trailer-story) from the original on March 3,
2022. Retrieved July 6, 2021.

Elden Ring sold 13.4 million copies worldwide by the end of March 2022,[87] and 20 million by February
2023.[88] It was the best-selling game in several regions between February and March 2022,[89][90] and is
the fastest-selling Bandai Namco game of all time.[91][92] It was the second-bestselling game of 2022 in the
US after Call of Duty: Modern Warfare II,[93] the third-bestselling in Europe,[94] and the tenth-bestselling in
Japan at retail.[95]
Elden Ring won many awards and accolades. It was listed as the
2022' s Game of the Year by several publications, including Ars
Technica,[96] Destructoid,[97] EGM,[98] Eurogam er,[99] Game
Informer,[100] GamesRadar +,[101] GameSpot,[102] IGN,[103] PC
Gamer,[104] and Polygon.[105] It is among the most awarded games
in history with 324 G ame of the Year awards.[106][107]S a l e s
A w a r d s,  Answer the following question - How many awards did Elden Ring Win, and did it win Game of the year award ?.

Mistral

1. "Elden Ring" (https://www.metacritic.com/game/pc/elden-ring). Metacritic. Archived (https://web.
archive.org/web/20220402212714/https://www.metacritic.com/game/pc/elden-ring) from the
original on April 2, 2022. Retrieved April 2, 2022.
2. "Elden Ring" (https://www.met

Llama 3 – 8B

Elden Ring winning Game of the Year
at the 23rd Game Developers Choice
AwardsSome reviewers criticized a number of the game's menu and accessibility systems.[84][85] Reviewers
complained about the poor performance of the Window s version; framerate issues were commonly
mentioned.[81][86] Reviewers noted the story of Elden Ring lacks Martin's writing style. Kyle Orland of Ars
Technica said the game's storytelling is "characteristically sparse and cryptic", and differs from the
expectations of Martin's fans.[76] Chris Carter

We can clearly see that Llama 3 – 8B got one part right, however, from the context it couldn’t find that it won 324 awards in total.

GPT-4

'Elden Ring won 324 Game of the Year awards. Yes, it did win the Game of the Year award at the 23rd Game Developers Choice Awards.'

GPT-4 is still far ahead of smaller LLMs, but Llama 3 8B has improved compared to Mistral.

April 19, 2024

Build Fully Local RAG Application with LLaMA 3: A Step-by-Step Guide

Meta just launched Llama 3 and its the best open source LLM you can use. So why not build a RAG Application using it. You can use the model for text-generation using either HuggingFace or Ollama, we will be using Ollama to create a RAG application which will run locally.

In this tutorial, we will build a Retrieval Augmented Generation(RAG) Application using Ollama and Langchain. For the vector store, we will be using Chroma, but you are free to use any vector store of your choice.

In case you just want the collab notebook, it’s available here.

There are 4 key steps to building your RAG application –

Load your documents
Add them to the vector store using the embedding function of your choice.
Define your prompt template.
Deinfe your Retrieval Chatbot using the LLM of your choice.

First we load the required libraries.

# Loading required libraries
import os
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.memory import ConversationSummaryMemory
from langchain_openai import OpenAIEmbeddings
from langchain.prompts import PromptTemplate
from langchain.llms import Ollama

Then comes step 1 which is to load our documents. Here I’ll be using Elden Ring Wiki PDF, you can just visit the Wikipedia page and download it as a PDF file.

data_path = "./data/Elden_Ring.pdf"
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2000,
    chunk_overlap=30,
    length_function=len,)
documents = PyPDFLoader(data_path).load_and_split(text_splitter=text_splitter)

In case you want to learn in detail about ChromaDB, you can visit our detailed guide to using ChromaDB. The next step is to use an embedding function that will convert our text into embeddings. I prefer using OpenAI embeddings, but you can use any embedding function. Using this embedding function we will add our documents to the Chroma vector database.

embedding_func = OpenAIEmbeddings(api_key=os.environ.get("OPENAI_API_KEY"))
vectordb = Chroma.from_documents(documents, embedding=embedding_func)

Moving on, we have to define a prompt template. I’ll be using the mistral model, so its a very basic prompt template that mistral provides.

template = """<s>[INST] Given the context - {context} </s>[INST] [INST] Answer the following question - {question}[/INST]"""
pt = PromptTemplate(
            template=template, input_variables=["context", "question"]
        )

All that is left to do is to define our memory and Retrieval Chatbot using Ollama as the LLM. To use Llama 3 as the LLM, all you have to do is define “llama3” as the model name.

rag = RetrievalQA.from_chain_type(
            llm=Ollama(model="mistral"),
            retriever=vectordb.as_retriever(),
            memory=ConversationSummaryMemory(llm = Ollama(model="mistral")),
            chain_type_kwargs={"prompt": pt, "verbose": True},
        )
rag.invoke("What is Elden Ring ?")
>>> {'query': 'What is Elden Ring ?',
 'history': '',
 'result': ' Elden Ring is a 2022 action role-playing game developed by FromSoftware. It was published for PlayStation 4, PlayStation 5, Windows, Xbox One, and Xbox Series X/S. In the game, players control a customizable character on a quest to repair the Elden Ring and become the new Elden Lord. The game is set in an open world, presented through a third-person perspective, and includes several types of weapons and magic spells. Players can traverse the six main areas using their steed Torrent and discover linear hidden dungeons and checkpoints that enable fast travel and attribute improvements. Elden Ring features online multiplayer mode for cooperative play or player-versus-player combat. The game was developed with inspirations from Dark Souls series, and contributions from George R.R. Martin on the narrative and Tsukasa Saitoh, Shoi Miyazawa, Tai Tomisawa, Yuka Kitamura, and Yoshimi Kudo for the original soundtrack. Elden Ring received critical acclaim for its open world, gameplay systems, and setting, with some criticism for technical performance. It sold over 20 million copies and a downloadable content expansion, Shadow of the Erdtree, is planned to be released in June 2024.'}

In sum, building a Retrieval Augmented Generation (RAG) application using the newly released LLaMA 3 model, Ollama, and Langchain enables robust local solutions for natural language queries. This tutorial walked you through the comprehensive steps of loading documents, embedding them into a vector store like Chroma, and setting up a dynamic RAG application that retrieves and generates responses efficiently. By harnessing the power of the newly released LLaMA 3 by Meta as the LLM and Langchain to create the chatbot, you can create intelligent systems that significantly enhance user interaction and information retrieval. The capabilities demonstrated here illustrate just a fraction of the potential applications. Let me know in the comments if you want me to cover something else.

April 19, 2024