Not even a couple of days have passed since Microsoft’s release of Phi-3-mini which I tested in this blog post. Now Apple has joined the race of Open Source LLMs by release a class of LLMs under the name OpenELM. In this post I’ll guide how you can start using it, as the HuggingFace page does not describe the tokeniser to use. We will also compare it to Phi-3-mini 3.8B instruct. For the comparison, I’ll be using OpenELM 1.1B Instruct. I ran into some issues using the 3B instruct model.
Link to collab notebook.
How to Run OpenELM?
I’m sure that by reading the model page, you might wonder that the tokenizer is not specified, so how can I start using OpenELM using HuggingFace, well the fact, and this is one of the most apple things, that they have described in the paper the tokenizer they are using. It can be found in this link. Its using a llama 2 tokenizer.
So your setup will look something like this –
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
device = "cuda:0" # Specify device
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf", pad_token_id=0) # Load in the tokenizer
# Load the model
model = AutoModelForCausalLM.from_pretrained("apple/OpenELM-3B-Instruct",device_map=device,trust_remote_code= True)
Once you’ve the mode loaded, you need to write the generation code. One to prepare the tokens and the next to generate the text. The code provided by HuggingFace also uses an assistant model, so we will use the same model as assistant.
def prepare_prompt(prompt:str):
tokens = tokenizer(prompt)
tokenized_prompt = torch.tensor(
tokens['input_ids'],
device=device
)
return tokenized_prompt.unsqueeze(0)
def generate(prompt:str, model:AutoModelForCausalLM, max_length:int = 128):
tokenized_prompt = prepare_prompt(prompt)
output_ids = model.generate(
tokenized_prompt,
max_length=max_length,
pad_token_id=0,
assistant_model = model)
output_text = tokenizer.decode(
output_ids[0].tolist(),
skip_special_tokens=True
)
return output_text
Now you’re set up for text generation using OpenELM.
Math Question
Find x and y from these equations. x-2y = 10 and 2x + y = 0. Be brief in your response.
Phi-3-mini Response
### Response
To solve the system of equations, we can use the substitution or elimination method. Here, we'll use the elimination method.
Given equations:
1) x - 2y = 10
2) 2x + y = 0
First, we'll manipulate the second equation to eliminate y:
Multiply the second equation by 2 to get:
4x + 2y = 0
Now, add this new equation to the first equation:
(x - 2y) + (4x + 2y) = 10 + 0
5x = 10
x = 10 / 5
x = 2
Now that we have x, we can substitute it back into either of the original equations to find y. We'll use the second equation:
2x + y = 0
2(2) + y = 0
4 + y = 0
y = -4
So, the solution is x = 2 and y = -4.
Phi-3-mini got the correct Answer.
OpenELM Response –
Find x and y from these equations. x-2y = 10 and 2x + y = 0. Be brief in your response.
Solution:
Find the following equations:
x-2y = 10
2x + y = 0
Solve the given equations for the given points:
Find:
x
y
Solution:
The given equations can be solved in the following form:
x-2y = -10
y = a + bx
Solve the given equations for the given points:
a = -10
b = -1
Solution:
y = -1 + bx
y = -2 + bx
The given solutions are:
y = -2 + bx and y = -1 + bx
Now, find the points a and b:
a = -1 + bx
bx = -2 + bx
Solution:
bx = -2 - a + b
bx = -2 - -1 + b
bx = -4
Find the points a and b:
a = -4 and b = -4
Solution:
a = -4 - b
a = 2
The given points a and b are:
a = 2 and b = 2
Good attempt, given it’s a 1.1 B parameter model, but wrong answer.
Pattern Question
"Fill in the blank letter '_' in the pattern 1,8,9,64,25,216,_. Only return the answer"
Phi-3-mini Response –
### Response
The pattern appears to be based on the sequence of cubes of natural numbers, but with alternating signs and skipping one number each time. The sequence is:
1^3 = 1
(-2)^3 = -8
3^3 = 27 (but given as 9, so it seems there's a pattern of subtracting 18)
(-4)^3 = -64
5^3 = 125 (but given as 64, so it seems there's a pattern of subtracting 61)
7^3 = 343 (but given as 25, so it seems there's a pattern of subtracting 318)
Following this pattern, the next number should be the cube of 8, but with the same subtraction pattern. Since the last operation was subtracting 318, the next operation should be subtracting 336 (318 + 18).
8^3 = 512
512 - 336 = 176
So the missing number is 176.
It inferred again that there some cubing, but did not arrive at the right solution, which is 49.
OpenELM 1.1B Response –
OpenELM completely went off the track, and its response makes no sense.
For example, given the pattern:
1,8,9,64,25,216,_.
To return the only match with '_':
_(1,8,9,64,25,216,_)
To return all matches with '_':
_(1_,8_,9_,64_,25_,216_,_)
To return all matches starting with '_':
_(_,8_,9_,25_,216_,216_,_)
To return all matches ending with '_':
_(_,9_,216_,216_,216_,_)
To return all matches containing '_':
_(_,8_,9_,25_,216_,216_,_)_
To return all matches starting with '_and_ ending with '_':
_(_,8_,9_,25_,216_,216_ _,)_
To return all matches starting with '_or_ ending with '_':
_(_,8_,9_,25_,216_,216|_)_
To return all matches starting with '_or_and_
SQL Question
Provide me an SQL query to find the oldest employee_id in every region in the table Employee.
The table has 3 columns - 1. employee_id - This is the primary key
2. joining_date - Date of joining of the employee
3. region - region of employee.
Phi-3-mini Response –
- Response: To find the oldest employee_id in every region from the `Employee` table, you can use the following SQL query. This query groups the results by the `region` column and then for each region, it finds the oldest `employee_id` based on the earliest `joining_date`.
```sql
SELECT
region,
employee_id
FROM
Employee
WHERE
(region, joining_date) IN (
SELECT
region,
MIN(joining_date)
FROM
Employee
GROUP BY
region
)
ORDER BY
region;
```
This looks like a query that could work, although a more efficient solution could be given.
OpenELM 1.1 B Response –
```sql
SELECT employee_id
FROM "Employee"
WHERE 1 = (SELECT MAX(1)
FROM "Employee"
WHERE "Region" = 'REGION_' + region_name(region));
region_name('Europe/London')
```
Though it gives a straightforward SQL, the quality of its answer is way below that of Phi-3-mini. But that’s to be expected, given the difference in model size.
Logical Puzzle 1
A lot of humans get this wrong –
A bat and ball cost 1.10 and the difference between a bat and a ball is is 1. How much does each cost? Be brief in your response.
Phi-3-mini Response
I am very impressed that it got this right, just like the math puzzle, so seems like its a good LLM for math problems.
### Response Let's denote the cost of the bat as B and the cost of the ball as C. We are given two pieces of information: 1. The total cost of the bat and the ball is $1.10: B + C = 1.10 2. The difference in cost between the bat and the ball is $1: B - C = 1 We can solve these two equations simultaneously to find the values of B and C. Adding the two equations together, we get: (B + C) + (B - C) = 1.10 + 1 2B = 2.10 B = 2.10 / 2 B = 1.05 Now that we have the cost of the bat, we can find the cost of the ball by substituting B back into one of the original equations: B + C = 1.10 1.05 + C = 1.10 C = 1.10 - 1.05 C = 0.05 So, the bat costs $1.05 and the ball costs $0.05.
OpenELM 1.1B Response
Math is definitely not its strong suite.
A bat and ball cost 1.10 and the difference between a bat and a ball is is 1. How much does each cost? Be brief in your response. A bat: B - 1, B_Ball - 1.10, B_Diff - 1 A ball: B_Ball - 1.10, B_Diff - 1, B_Cost - 1 Solution: A bat cost B = B_Ball + B_Diff = B_Cost + 1 A ball cost B_Ball = B_Cost + B_Diff = B_Cost + 2 Solution: A bat cost B = 2 = 2 + 1 A ball cost B_Ball = B_Cost + B_Diff = 2 + 3 Solution: A bat cost B = 2 + 3 = 4 A ball cost B_Ball = B_Cost + B_Diff = 4 + 3 Solution: A bat cost B = 4 = 4 + 2 + 3 A bat + ball cost = 4 + 4 = 8 8 = 8 * 2 + 4 * 3 + 2 * 4 + 3 * 4 + 4 *
Logical Puzzle 2
It takes 5 machines 5 minutes to make 5 widgets. How long does it take 100 machines to make 100 widgets? Be brief in your response.
To us the answer is fairly obvious that it will be 5.
Phi-3-mini Response
It gets this right again, very strong in easy logical tasks.
### Response It still takes 5 minutes for 100 machines to make 100 widgets. The number of machines and widgets scales linearly, so the time remains the same.
OpenELM 1.1B Response
Well it tries to reason with itself, and arrives at some solution using a convoluted logic, but gets the wrong answer.
It takes 5 machines 5 minutes to make 5 widgets. How long does it take 100 machines to make 100 widgets? Be brief in your response.
It takes 5 machines 5 minutes to make 5 widgets. How long does it take 100 machines to make 100 widgets:
1. 100 machines = 100 machines (i, j) = (1 + i + j - 1) machines
2. 5 minutes = 60 seconds = 1 i-step (i = 1, 2, ..., n-1)
3. 5 minutes / 60 seconds = 8.33333 i-steps per minute (i = 1 to n-1)
4. 8.33333 i-steps per minute * 100 machines = 83.3333 seconds (i = 1 to 100)
5. Multiply by 100:
- 83.3333 seconds / 100 machines = 8.3333 minutes (m = 1 to 100)
The answer is:
100 machines = 8 minutes (m = 1 to 100)
So, it takes 8 minutes to make 100 widgets using 100 machines.
Conclusion
It’s wonderful to witness Apple’s support for the open-source community by releasing LLMs that are slightly flawed but still highly functional, and have parameter sizes that can be accommodated on a single GPU. Additionally, I will be evaluating the larger 3B OpenELM in the near future, so stay tuned for the article.







